Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is the documentation for a snapshot of the develop branch, built from commit cdb310143f.
PrevUpHomeNext

User guide

Making histograms
Axis guide
Overview
Axis configuration
Filling histograms and accessing cells
Using profiles
Using operators
Using algorithms
Algorithms from the C++ standard library
Summation
Projection
Reduction
Streaming
Serialization
Using histograms in APIs
Advanced usage
User-defined axes
Axis with several arguments
User-defined storage class
Parallelisation options
User-defined accumulators

Boost.Histogram is designed to make simple things simple, yet complex things possible. Correspondingly, this guides covers the basic usage first, and the advanced usage in later sections. For an alternative quick start guide, have a look at the Getting started section.

A histogram consists of a collection of axis objects and a storage. The storage holds a collection of accumulators, one for each cell. The axis objects maps input values to indices, which are combined into a global index that is used to look up the cell in the storage.

To start off you do not have to worry about the storage, the library provides a good default. Learning more about the many interesting axis types to choose from, however, will pay off very quickly (which are discussed further below). For now, let us stick to the most common axis, the regular axis. It represents equidistant intervals on the real line.

Histograms are created with the convenient factory function make_histogram. The following example shows how to make a histogram with a single axis.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // create a 1d-histogram in default configuration which
  // covers the real line from -1 to 1 in 100 bins
  auto h = make_histogram(axis::regular<>(100, -1.0, 1.0));

  // rank is the number of axes
  assert(h.rank() == 1);
}

An axis object defines how input values are mapped to bins, it is a mapping functor of input values to indices. The axis object holds information such as how many bins there are, where the bin edges are, metadata about the axis and so on. The rank of a histogram is given by the number of axes. A histogram with one axis is one-dimensional. If you provide two, it is two-dimensional, and so on.

In the example above, the compiler knows the number of axes and their type at compile-time, the information can be deduced from the arguments to make_histogram. This gives the best performance, but sometimes you only know the axis configuration at run-time, because it depends on information that's only available at run-time. For that case you can also create axes at run-time and pass them to an overload of the make_histogram function. Here is an example.

#include <boost/histogram.hpp>
#include <cassert>
#include <sstream>
#include <vector>

const char* config = "4 1.0 2.0\n"
                     "5 3.0 4.0\n";

int main() {
  using namespace boost::histogram;

  // read axis config from a config file (mocked here with std::istringstream)
  // and create vector of regular axes, the number of axis is not known at compile-time
  std::istringstream is(config);
  auto v1 = std::vector<axis::regular<>>();
  while (is.good()) {
    unsigned bins;
    double start, stop;
    is >> bins >> start >> stop;
    v1.emplace_back(bins, start, stop);
  }

  // create histogram from iterator range
  // (copying or moving the vector also works, move is shown below)
  auto h1 = make_histogram(v1.begin(), v1.end());
  assert(h1.rank() == v1.size());

  // with a vector of axis::variant (polymorphic axis type that can hold any one of the
  // template arguments at a time) the types and number of axis can vary at run-time
  auto v2 = std::vector<axis::variant<axis::regular<>, axis::integer<>>>();
  v2.emplace_back(axis::regular<>(100, -1.0, 1.0));
  v2.emplace_back(axis::integer<>(1, 7));

  // create dynamic histogram by moving the vector
  auto h2 = make_histogram(std::move(v2));
  assert(h2.rank() == 2);
}
[Note] Note

When the axis types are known at compile-time, the histogram internally holds them in a std::tuple, which is very efficient and avoids a heap memory allocation. If the number of axes is only known at run-time, they are stored in a std::vector. The interface hides this difference very well, but there are a corner cases where the difference becomes apparent. The overview has more details on this point.

The factory function named make_histogram uses the default storage type, which provides safe counting, is fast, and memory efficient. If you want to create a histogram with another storage type, use make_histogram_with. To learn more about other storage types and how to create your own, have a look at the section Advanced Usage.

The library provides a number of useful axis types. The builtin axis types can be configured to fit many needs. If you still need something more exotic, no problem, it is easy to write your own axis types, see the Advanced usage section for details. In the following, we give some advice when to use which of the builtin axis types.

regular

Axis over intervals on the real line which have equal width. Value-to-index conversion is O(1) and very fast. The axis does not allocate memory dynamically. The axis is very flexible thanks to transforms (see below). Due to finite precision of floating point calculations, bin edges may not be exactly at expected values, though. If you need bin edges at exactly defined floating point values, use the variable axis. If you need bins at exact consecutive integral values, use the integer axis.

variable

Axis over intervals on the real line of variable width. Value-to-index conversion is O(log(N)). The axis allocates memory dynamically to store the bin edges. Use this if the regular axis with transforms cannot represent the binning you want. If you need bin edges at exactly defined floating point values, use this axis.

integer

Axis over an integer sequence [i, i+1, i+2, ...]. It can be configured to handle real input values, too, and then acts like a fast regular axis with a fixed bin width of 1. Value-to-index conversion is O(1) and faster than for the regular axis. Does not allocate memory dynamically. Use this when your input consists of an integer range or pre-digitized values with low dynamic range, like pixel values for individual colours in an image.

boolean

Axis over the two values [false, true]. It is a common specialization of the regular axis. Value-to-index conversion is a pass-through operation, so this is the fastest possible axis. The axis has no state other than the metadata (which can be stateless). Does not allocate memory dynamically. Use this when your input consists of binary categories, like signal and background.

category

Axis over a set of unique values of an arbitrary equal-comparable type. Value-to-index conversion is O(N), but still faster than the O(logN) complexity of the variable axis for N < 10, the typical use case. The axis allocates memory from the heap to store the values.

Here is an example which shows the basic use case for each axis type.

#include <boost/histogram/axis.hpp>
#include <limits>

int main() {
  using namespace boost::histogram;

  // make a regular axis with 10 bins over interval from 1.5 to 2.5
  auto r = axis::regular<>{10, 1.5, 2.5};
  // `<>` is needed in C++14 because the axis is templated,
  // in C++17 you can do: auto r = axis::regular{10, 1.5, 2.5};
  assert(r.size() == 10);
  // alternatively, you can define the step size with the `step` marker
  auto r_step = axis::regular<>{axis::step(0.1), 1.5, 2.5};
  assert(r_step == r);

  // histogram uses the `index` method to convert values to indices
  // note: intervals of builtin axis types are always semi-open [a, b)
  assert(r.index(1.5) == 0);
  assert(r.index(1.6) == 1);
  assert(r.index(2.4) == 9);
  // index for a value below the start of the axis is always -1
  assert(r.index(1.0) == -1);
  assert(r.index(-std::numeric_limits<double>::infinity()) == -1);
  // index for a value below the above the end of the axis is always `size()`
  assert(r.index(3.0) == 10);
  assert(r.index(std::numeric_limits<double>::infinity()) == 10);
  // index for not-a-number is also `size()` by convention
  assert(r.index(std::numeric_limits<double>::quiet_NaN()) == 10);

  // make a variable axis with 3 bins [-1.5, 0.1), [0.1, 0.3), [0.3, 10)
  auto v = axis::variable<>{-1.5, 0.1, 0.3, 10.};
  assert(v.index(-2.0) == -1);
  assert(v.index(-1.5) == 0);
  assert(v.index(0.1) == 1);
  assert(v.index(0.3) == 2);
  assert(v.index(10) == 3);
  assert(v.index(20) == 3);

  // make an integer axis with 3 bins at -1, 0, 1
  auto i = axis::integer<>{-1, 2};
  assert(i.index(-2) == -1);
  assert(i.index(-1) == 0);
  assert(i.index(0) == 1);
  assert(i.index(1) == 2);
  assert(i.index(2) == 3);

  // make an integer axis called "foo"
  auto i_with_label = axis::integer<>{-1, 2, "foo"};
  // all builtin axis types allow you to pass some optional metadata as the last
  // argument in the constructor; a string by default, but can be any copyable type

  // two axis do not compare equal if they differ in their metadata
  assert(i != i_with_label);

  // integer axis also work well with unscoped enums
  enum { red, blue };
  auto i_for_enum = axis::integer<>{red, blue + 1};
  assert(i_for_enum.index(red) == 0);
  assert(i_for_enum.index(blue) == 1);

  // make a category axis from a scoped enum and/or if the identifiers are not consecutive
  enum class Bar { red = 12, blue = 6 };
  auto c = axis::category<Bar>{Bar::red, Bar::blue};
  assert(c.index(Bar::red) == 0);
  assert(c.index(Bar::blue) == 1);
  // c.index(12) is a compile-time error, since the argument must be of type `Bar`

  // category axis can be created for any copyable and equal-comparable type
  auto c_str = axis::category<std::string>{"red", "blue"};
  assert(c_str.index("red") == 0);
  assert(c_str.index("blue") == 1);
}
[Note] Note

All builtin axes over a continuous value range use semi-open intervals in their constructors as a convention. As a mnemonic, think of iterator ranges from begin to end, where end is also never included.

As mentioned in the previous example, you can assign an optional label to any axis to keep track of what the axis is about. Assume you have census data and you want to investigate how yearly income correlates with age, you could do:

#include <boost/histogram.hpp>

int main() {
  using namespace boost::histogram;

  // create a 2d-histogram with an "age" and an "income" axis
  auto h = make_histogram(axis::regular<>(20, 0.0, 100.0, "age in years"),
                          axis::regular<>(20, 0.0, 100.0, "yearly income in Thousands"));

  // do something with h
}

Without the metadata it would be difficult to see which axis was covering which quantity. Metadata is the only axis property that can be modified after construction by the user. Axis objects with different metadata do not compare equal.

By default, strings are used to store the metadata, but any type compatible with the Metadata concept can be used.

All builtin axis types have template arguments for customisation. All arguments have reasonable defaults so you can use empty brackets. If your compiler supports C++17, you can drop the brackets altogether. Suitable arguments are then deduced from the constructor call. The template arguments are in order:

Value

The value type is the argument type of the index() method. An argument passed to the axis must be implicitly convertible to this type.

Transform (only regular axis)

A class that implements a monotonic transform between the data space and the space in which the bins are equi-distant. Users can define their own transforms and use them with the axis.

Metadata

The axis uses an instance this type to store metadata. It is a std::string by default, but it can by any copyable type. If you want to save a small amount of stack memory per axis, you pass the empty boost::histogram::axis::null_type here.

Options

Compile-time options for the axis. This is used to enable/disable under- and overflow bins, to make an axis circular, or to enable dynamic growth of the axis beyond the initial range.

Allocator (only variable and category axes)

Allocator that is used to request memory dynamically to store values. If you don't know what an allocator is you can safely ignore this argument.

You can use the boost::histogram::use_default tag type for any of these options, except for the Value and Allocator, to use the library default.

Transforms are a way to customize a regular axis. Transforms allow you to chose the faster stack-allocated regular axis over the generic variable axis in some cases. The default is the identity transform which does nothing. A common need is a regular binning in the logarithm of the input value. This can be achieved with a log transform. The follow example shows the builtin transforms.

#include <boost/histogram/axis/regular.hpp>
#include <limits>

int main() {
  using namespace boost::histogram;

  // make a regular axis with a log transform over [10, 100), [100, 1000), [1000, 10000)
  axis::regular<double, axis::transform::log> r_log{3, 10., 10000.};
  // log transform:
  // - useful when values vary dramatically in magnitude, like brightness of stars
  // - edges are not exactly at 10, 100, 1000, because of finite floating point precision
  // - values >= 0 but smaller than the starting value of the axis are mapped to -1
  // - values < 0 are mapped to `size()`, because the result of std::log(value) is NaN
  assert(r_log.index(10.1) == 0);
  assert(r_log.index(100.1) == 1);
  assert(r_log.index(1000.1) == 2);
  assert(r_log.index(1) == -1);
  assert(r_log.index(0) == -1);
  assert(r_log.index(-1) == 3);

  // make a regular axis with a sqrt transform over [4, 9), [9, 16), [16, 25)
  axis::regular<double, axis::transform::sqrt> r_sqrt{3, 4., 25.};
  // sqrt transform:
  // - bin widths are more mildly increasing compared to log transform
  // - axis starting at value == 0 is ok, sqrt(0) == 0 unlike log transform
  // - values < 0 are mapped to `size()`, because the result of std::sqrt(value) is NaN
  assert(r_sqrt.index(0) == -1);
  assert(r_sqrt.index(4.1) == 0);
  assert(r_sqrt.index(9.1) == 1);
  assert(r_sqrt.index(16.1) == 2);
  assert(r_sqrt.index(25.1) == 3);
  assert(r_sqrt.index(-1) == 3);

  // make a regular axis with a power transform x^1/3 over [1, 8), [8, 27), [27, 64)
  using pow_trans = axis::transform::pow;
  axis::regular<double, pow_trans> r_pow(pow_trans{1. / 3.}, 3, 1., 64.);
  // pow transform:
  // - generalization of the sqrt transform
  // - starting the axis at value == 0 is ok for power p > 0, 0^p == 0 for p > 0
  // - values < 0 are mapped to `size()` if power p is not a positive integer
  assert(r_pow.index(0) == -1);
  assert(r_pow.index(1.1) == 0);
  assert(r_pow.index(8.1) == 1);
  assert(r_pow.index(27.1) == 2);
  assert(r_pow.index(64.1) == 3);
  assert(r_pow.index(-1) == 3);
}

Due to the finite precision of floating point calculations, the bin edges of a transformed regular axis may not be exactly at the expected values. If you need exact correspondence, use a variable axis.

Users may write their own transforms and use them with the builtin regular axis, by implementing a type that matches the Transform concept.

Options can be used to configure each axis type. The option flags are implemented as tag types with the suffix _t. Each tag type has a corresponding value without the suffix. The values have set semantics: You can compute the union with operator| and the intersection with operator&. When you pass a single option flag to an axis as a template parameter, use the tag type. When you need to pass the union of several options to an axis as a template parameter, surround the union of option values with a decltype. Both ways of passing options are shown in the following examples.

Under- and overflow bins

Under- and overflow bins are added automatically for most axis types. If you create an axis with 10 bins, the histogram will actually have 12 bins along that axis. The extra bins are very useful, as explained in the rationale. If the input cannot exceed the axis range or if you are really tight on memory, you can disable the extra bins. A demonstration:

#include <boost/histogram.hpp>
#include <string>

int main() {
  using namespace boost::histogram;

  // create a 1d-histogram over integer values from 1 to 6
  auto h1 = make_histogram(axis::integer<int>(1, 7));
  // axis has size 6...
  assert(h1.axis().size() == 6);
  // ... but histogram has size 8, because of overflow and underflow bins
  assert(h1.size() == 8);

  // create a 1d-histogram for throws of a six-sided die without extra bins,
  // since the values cannot be smaller than 1 or larger than 6
  auto h2 = make_histogram(axis::integer<int, use_default, axis::option::none_t>(1, 7));
  // now size of axis and histogram is equal
  assert(h2.axis().size() == 6);
  assert(h2.size() == 6);
}

The category axis by default comes only with an overflow bin, which counts all input values that are not part of the initial set.

Circular axes

Each builtin axis except the category axis can be made circular. This means that the axis is periodic at its ends. This is useful if you want make a histogram over a polar angle. Example:

#include <boost/histogram/axis.hpp>
#include <limits>

int main() {
  using namespace boost::histogram;

  // make a circular regular axis ... [0, 180), [180, 360), [0, 180) ....
  using opts = decltype(axis::option::overflow | axis::option::circular);
  auto r = axis::regular<double, use_default, use_default, opts>{2, 0., 360.};
  assert(r.index(-180) == 1);
  assert(r.index(0) == 0);
  assert(r.index(180) == 1);
  assert(r.index(360) == 0);
  assert(r.index(540) == 1);
  assert(r.index(720) == 0);
  // special values are mapped to the overflow bin index
  assert(r.index(std::numeric_limits<double>::infinity()) == 2);
  assert(r.index(-std::numeric_limits<double>::infinity()) == 2);
  assert(r.index(std::numeric_limits<double>::quiet_NaN()) == 2);

  // since the regular axis is the most common circular axis, there exists an alias
  auto c = axis::circular<>{2, 0., 360.};
  assert(r == c);

  // make a circular integer axis
  auto i = axis::integer<int, use_default, axis::option::circular_t>{1, 4};
  assert(i.index(0) == 2);
  assert(i.index(1) == 0);
  assert(i.index(2) == 1);
  assert(i.index(3) == 2);
  assert(i.index(4) == 0);
  assert(i.index(5) == 1);
}

A circular axis cannot have an underflow bin, passing both options together generates a compile-time error. Since the highest bin wraps around to the lowest bin, there is no possibility for overflow either. However, an overflow bin is still added by default if the value is a floating point type, to catch NaNs and infinities.

Growing axes

Each builtin axis has an option to make it grow beyond its initial range when a value outside of that range is passed to it, while the default behaviour is to count this value in the under- or overflow bins (or to discard it). Example:

#include <boost/histogram.hpp>
#include <cassert>

#include <iostream>

int main() {
  using namespace boost::histogram;

  // make a growing regular axis
  // - it grows new bins with its constant bin width until the value is covered
  auto h1 = make_histogram(axis::regular<double,
                                         use_default,
                                         use_default,
                                         axis::option::growth_t>{2, 0., 1.});
  // nothing special happens here
  h1(0.1);
  h1(0.9);
  // state: [0, 0.5): 1, [0.5, 1.0): 1
  assert(h1.axis().size() == 2);
  assert(h1.axis().bin(0).lower() == 0.0);
  assert(h1.axis().bin(1).upper() == 1.0);

  // value below range: axis grows new bins until value is in range
  h1(-0.3);
  // state: [-0.5, 0.0): 1, [0, 0.5): 1, [0.5, 1.0): 1
  assert(h1.axis().size() == 3);
  assert(h1.axis().bin(0).lower() == -0.5);
  assert(h1.axis().bin(2).upper() == 1.0);

  h1(1.9);
  // state: [-0.5, 0.0): 1, [0, 0.5): 1, [0.5, 1.0): 1, [1.0, 1.5): 0 [1.5, 2.0): 1
  assert(h1.axis().size() == 5);
  assert(h1.axis().bin(0).lower() == -0.5);
  assert(h1.axis().bin(4).upper() == 2.0);

  // make a growing category axis (here strings)
  // - empty axis is allowed: very useful if categories are not known at the beginning
  auto h2 = make_histogram(axis::category<std::string,
                                          use_default,
                                          axis::option::growth_t>());
  assert(h2.size() == 0); // histogram is empty
  h2("foo"); // new bin foo, index 0
  assert(h2.size() == 1);
  h2("bar"); // new bin bar, index 1
  assert(h2.size() == 2);
  h2("foo");
  assert(h2.size() == 2);
  assert(h2[0] == 2);
  assert(h2[1] == 1);
}

This feature can be very convenient, but keep two caveats in mind.

  • Growing axes come with a run-time cost, since the histogram has to reallocate memory for all cells when an axis changes its size. Whether this performance hit is noticeable depends on your application. This is a minor issue, the next is more severe.
  • If you have unexpected outliers in your data which are far away from the normal range, the axis could grow to a huge size and the corresponding huge memory request could bring the computer to its knees. This is one of the reason why growing axes are not the default.

A growing axis can have under- and overflow bins, but these only count the special floating point values: positive and negative infinity, and NaN.

A histogram has been created and now you want to insert values. This is done with the flexible call operator or the fill method, which you typically call in a loop. The call operator accepts N arguments or a std::tuple with N elements, where N is equal to the number of axes of the histogram. It finds the corresponding bin for the input and increments the bin counter by one. The fill method accepts a single iterable over other iterables (which must have have elements contiguous in memory) or values, see the method documentation for details.

After the histogram has been filled, use the at method (in analogy to std::vector::at) to access the cell values. It accepts integer indices, one for each axis of the histogram.

#include <boost/histogram.hpp>
#include <cassert>
#include <functional>
#include <numeric>
#include <utility>
#include <vector>

int main() {
  using namespace boost::histogram;

  auto h = make_histogram(axis::integer<>(0, 3), axis::regular<>(2, 0.0, 1.0));

  // fill histogram, number of arguments must be equal to number of axes,
  // types must be convertible to axis value type (here integer and double)
  h(0, 0.2); // increase a cell value by one
  h(2, 0.5); // increase another cell value by one

  // fills from a tuple are also supported; passing a tuple of wrong size
  // causes an error at compile-time or an assertion at runtime in debug mode
  auto xy = std::make_tuple(1, 0.3);
  h(xy);

  // chunk-wise filling is also supported and more efficient, make some data...
  std::vector<double> xy2[2] = {{0, 2, 5}, {0.8, 0.4, 0.7}};

  // ... and call fill method
  h.fill(xy2);

  // once histogram is filled, access individual cells using operator[] or at(...)
  // - operator[] can only accept a single argument in the current version of C++,
  //   it is convenient when you have a 1D histogram
  // - at(...) can accept several values, so use this by default
  // - underflow bins are at index -1, overflow bins at index `size()`
  // - passing an invalid index triggers a std::out_of_range exception
  assert(h.at(0, 0) == 1);
  assert(h.at(0, 1) == 1);
  assert(h.at(1, 0) == 1);
  assert(h.at(1, 1) == 0);
  assert(h.at(2, 0) == 1);
  assert(h.at(2, 1) == 1);
  assert(h.at(-1, -1) == 0); // underflow for axis 0 and 1
  assert(h.at(-1, 0) == 0);  // underflow for axis 0, normal bin for axis 1
  assert(h.at(-1, 2) == 0);  // underflow for axis 0, overflow for axis 1
  assert(h.at(3, 1) == 1);   // overflow for axis 0, normal bin for axis 1

  // iteration over values works, but see next example for a better way
  // - iteration using begin() and end() includes under- and overflow bins
  // - iteration order is an implementation detail and should not be relied upon
  const double sum = std::accumulate(h.begin(), h.end(), 0.0);
  assert(sum == 6);
}

For a histogram hist, the calls hist(weight(w), ...) and hist(..., weight(w)) increment the bin counter by the value w instead, where w may be an integer or floating point number. The helper function weight() marks this argument as a weight, so that it can be distinguished from the other inputs. It can be the first or last argument. You can freely mix calls with and without a weight. Calls without weight act like the weight is 1. Why weighted increments are sometimes useful is explained in the rationale.

[Note] Note

The default storage loses its no-overflow-guarantee when you pass floating point weights, but maintains it for integer weights.

When the weights come from a stochastic process, it is useful to keep track of the variance of the sum of weights per cell. A specialized histogram can be generated with the make_weighted_histogram factory function which does that.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // Create a histogram with weight counters that keep track of a variance estimate.
  auto h = make_weighted_histogram(axis::regular<>(3, 0.0, 1.0));

  h(0.0, weight(1)); // weight 1 goes to first bin
  h(0.1, weight(2)); // weight 2 goes to first bin
  h(0.4, weight(3)); // weight 3 goes to second bin
  h(0.5, weight(4)); // weight 4 goes to second bin

  // chunk-wise filling is also supported
  auto x = {0.2, 0.6};
  auto w = {5, 6};
  h.fill(x, weight(w));

  // Weight counters have methods to access the value (sum of weights) and the variance
  // (sum of weights squared, why this gives the variance is explained in the rationale)
  assert(h[0].value() == 1 + 2 + 5);
  assert(h[0].variance() == 1 * 1 + 2 * 2 + 5 * 5);
  assert(h[1].value() == 3 + 4 + 6);
  assert(h[1].variance() == 3 * 3 + 4 * 4 + 6 * 6);
}

To iterate over all cells, the indexed range generator is very convenient and also efficient. For almost all configurations, the range generator iterates faster than a naive for-loop. Under- and overflow are skipped by default.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <numeric> // for std::accumulate
#include <sstream>

using namespace boost::histogram;

int main() {
  // make histogram with 2 x 2 = 4 bins (not counting under-/overflow bins)
  auto h = make_histogram(axis::regular<>(2, -1.0, 1.0), axis::regular<>(2, 2.0, 4.0));

  h(weight(1), -0.5, 2.5); // bin index 0, 0
  h(weight(2), -0.5, 3.5); // bin index 0, 1
  h(weight(3), 0.5, 2.5);  // bin index 1, 0
  h(weight(4), 0.5, 3.5);  // bin index 1, 1

  // use the `indexed` range adaptor to iterate over all bins;
  // it is not only more convenient but also faster than a hand-crafted loop!
  std::ostringstream os;
  for (auto&& x : indexed(h)) {
    // x is a special accessor object
    const auto i = x.index(0); // current index along first axis
    const auto j = x.index(1); // current index along second axis
    const auto b0 = x.bin(0);  // current bin interval along first axis
    const auto b1 = x.bin(1);  // current bin interval along second axis
    const auto v = *x;         // "dereference" to get the bin value
    os << boost::format("%i %i [%2i, %i) [%2i, %i): %i\n") % i % j % b0.lower() %
              b0.upper() % b1.lower() % b1.upper() % v;
  }

  std::cout << os.str() << std::flush;

  assert(os.str() == "0 0 [-1, 0) [ 2, 3): 1\n"
                     "1 0 [ 0, 1) [ 2, 3): 3\n"
                     "0 1 [-1, 0) [ 3, 4): 2\n"
                     "1 1 [ 0, 1) [ 3, 4): 4\n");

  // `indexed` skips underflow and overflow bins by default, but can be called
  // with the second argument `coverage::all` to walk over all bins
  std::ostringstream os2;
  for (auto&& x : indexed(h, coverage::all)) {
    os2 << boost::format("%2i %2i: %i\n") % x.index(0) % x.index(1) % *x;
  }

  std::cout << os2.str() << std::flush;

  assert(os2.str() == "-1 -1: 0\n"
                      " 0 -1: 0\n"
                      " 1 -1: 0\n"
                      " 2 -1: 0\n"

                      "-1  0: 0\n"
                      " 0  0: 1\n"
                      " 1  0: 3\n"
                      " 2  0: 0\n"

                      "-1  1: 0\n"
                      " 0  1: 2\n"
                      " 1  1: 4\n"
                      " 2  1: 0\n"

                      "-1  2: 0\n"
                      " 0  2: 0\n"
                      " 1  2: 0\n"
                      " 2  2: 0\n");
}

Histograms from this library can do more than counting, they can hold arbitrary accumulators which accept samples. We call a histogram with accumulators that compute the mean of samples in each cell a profile. Profiles can be generated with the factory function make_profile.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>
#include <tuple>

int main() {
  using namespace boost::histogram;

  // make a profile, it computes the mean of the samples in each histogram cell
  auto h = make_profile(axis::integer<>(0, 3));

  // mean is computed from the values marked with the sample() helper function
  h(0, sample(1)); // sample goes to cell 0
  h(0, sample(2)); // sample goes to cell 0
  h(1, sample(3)); // sample goes to cell 1
  h(sample(4), 1); // sample goes to cell 1; sample can be first or last argument

  // fills from tuples are also supported, 5 and 6 go to cell 2
  auto a = std::make_tuple(2, sample(5));
  auto b = std::make_tuple(sample(6), 2);
  h(a);
  h(b);

  // builtin accumulators have methods to access their state
  std::ostringstream os;
  for (auto&& x : indexed(h)) {
    // use `.` to access methods of accessor, like `index()`
    // use `->` to access methods of accumulator
    const auto i = x.index();
    const auto n = x->count();     // how many samples are in this bin
    const auto vl = x->value();    // mean value
    const auto vr = x->variance(); // estimated variance of the mean value
    os << boost::format("index %i count %i value %.1f variance %.1f\n") % i % n % vl % vr;
  }

  std::cout << os.str() << std::flush;

  assert(os.str() == "index 0 count 2 value 1.5 variance 0.5\n"
                     "index 1 count 2 value 3.5 variance 0.5\n"
                     "index 2 count 2 value 5.5 variance 0.5\n");
}

Just like weight(), sample() is a marker function. It must be the first or last argument.

Weights and samples may be combined, if the accumulators can handle weights. When both weight() and sample() appear in the call operator or the fill method, they can be in any order with respect to other, but they must be the first or last arguments. To make a profile which can compute weighted means with proper uncertainty estimates, use the make_weighted_profile factory function.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;
  using namespace boost::histogram::literals; // _c suffix creates compile-time numbers

  // make 2D weighted profile
  auto h = make_weighted_profile(axis::integer<>(0, 2), axis::integer<>(0, 2));

  // The mean is computed from the values marked with the sample() helper function.
  // Weights can be passed as well. The `sample` and `weight` arguments can appear in any
  // order, but they must be the first or last arguments.
  h(0, 0, sample(1));            // sample goes to cell (0, 0); weight is 1
  h(0, 0, sample(2), weight(3)); // sample goes to cell (0, 0); weight is 3
  h(1, 0, sample(3));            // sample goes to cell (1, 0); weight is 1
  h(1, 0, sample(4));            // sample goes to cell (1, 0); weight is 1
  h(0, 1, sample(5));            // sample goes to cell (1, 0); weight is 1
  h(0, 1, sample(6));            // sample goes to cell (1, 0); weight is 1
  h(1, 1, weight(4), sample(7)); // sample goes to cell (1, 1); weight is 4
  h(weight(5), sample(8), 1, 1); // sample goes to cell (1, 1); weight is 5

  std::ostringstream os;
  for (auto&& x : indexed(h)) {
    const auto i = x.index(0_c);
    const auto j = x.index(1_c);
    const auto m = x->value();    // weighted mean
    const auto v = x->variance(); // estimated variance of weighted mean
    os << boost::format("index %i,%i mean %.1f variance %.1f\n") % i % j % m % v;
  }

  std::cout << os.str() << std::flush;

  assert(os.str() == "index 0,0 mean 1.8 variance 0.5\n"
                     "index 1,0 mean 3.5 variance 0.5\n"
                     "index 0,1 mean 5.5 variance 0.5\n"
                     "index 1,1 mean 7.6 variance 0.5\n");
}

The following operators are supported for pairs of histograms +, -, *, /, ==, !=. Histograms can also be multiplied and divided by a scalar. Only a subset of the arithmetic operators is available when the underlying accumulator only supports that subset.

The arithmetic operators can only be used when the histograms have the same axis configuration. This checked at run-time. An exception is thrown if the configurations do not match. Two histograms have the same axis configuration, if all axes compare equal, which includes a comparison of their metadata. Two histograms compare equal, when their axis configurations and all their cell values compare equal.

[Note] Note

If the metadata type has operator== defined, it is used in the axis configuration comparison. Metadata types without operator== are considered equal, if they are the same type.

#include <boost/histogram.hpp>
#include <cassert>
#include <vector>

int main() {
  using namespace boost::histogram;

  // make two histograms
  auto h1 = make_histogram(axis::regular<>(2, -1.0, 1.0));
  auto h2 = make_histogram(axis::regular<>(2, -1.0, 1.0));

  h1(-0.5); // counts are: 1 0
  h2(0.5);  // counts are: 0 1

  // add them
  auto h3 = h1;
  h3 += h2; // counts are: 1 1

  // adding multiple histograms at once is likely to be optimized by the compiler so that
  // superfluous temporaries avoided, but no guarantees are given; use this equivalent
  // code when you want to make sure: h4 = h1; h4 += h2; h4 += h3;
  auto h4 = h1 + h2 + h3; // counts are: 2 2

  assert(h4.at(0) == 2 && h4.at(1) == 2);

  // multiply by number, h4 *= 2 also works
  auto h5 = h4 * 2; // counts are: 4 4

  // divide by number; s4 /= 4 also works
  auto h6 = h5 / 4; // counts are: 1 1

  assert(h6.at(0) == 1 && h6.at(1) == 1);
  assert(h6 != h5 && h5 == 4 * h6);

  // note the special effect of multiplication on weight_storage
  auto h = make_histogram_with(weight_storage(), axis::regular<>(2, -1.0, 1.0));
  h(-0.5);

  // counts are: 1 0
  assert(h.at(0).value() == 1 && h.at(1).value() == 0);

  auto h_sum = h + h;
  auto h_mul = 2 * h;

  // values are the same as expected...
  assert(h_sum.at(0).value() == h_mul.at(0).value());
  // ... but variances differ
  assert(h_sum.at(0).variance() == 2 && h_mul.at(0).variance() == 4);

  // equality operator checks variances, so histograms are not equal
  assert(h_sum != h_mul);
}
[Note] Note

A histogram with default storage converts its cell values to double when they are to be multiplied with or divided by a real number, or when a real number is added or subtracted. At this point the no-overflow-guarantee is lost.

[Note] Note

When the storage tracks weight variances, such as boost::histogram::weight_storage, adding two copies of a histogram produces a different result than scaling the histogram by a factor of two, as shown in the last example. The is a consequence of the mathematical properties of variances. They can be added like normal numbers, but scaling by s means that variances are scaled by s^2.

The library was designed to work with algorithms from the C++ standard library. In addition, a support library of algorithms is included with common operations on histograms.

The histogram class has standard random-access iterators which can be used with various algorithms from the standard library. Some things need to be kept in mind:

  • The histogram iterators also walk over the under- and overflow bins, if they are present. To skip the extra bins one should use the fast iterators from the boost::histogram::indexed range generator instead.
  • The iteration order for histograms with several axes is an implementation-detail, but for histograms with one axis it is guaranteed to be the obvious order: bins are accessed in order from the lowest to the highest index.
#include <boost/histogram.hpp>
#include <cassert>

#include <algorithm> // fill, any_of, min_element, max_element
#include <cmath>     // sqrt
#include <numeric>   // partial_sum, inner_product

int main() {
  using namespace boost::histogram;

  // make histogram that represents a probability density function (PDF)
  auto h1 = make_histogram(axis::regular<>(4, 1.0, 3.0));

  // make indexed range to skip underflow and overflow cells
  auto ind = indexed(h1);

  // use std::fill to set all counters to 0.25 (except under- and overflow counters)
  std::fill(ind.begin(), ind.end(), 0.25);

  // compute the cumulative density function (CDF), overriding cell values
  std::partial_sum(ind.begin(), ind.end(), ind.begin());

  assert(h1.at(-1) == 0.00);
  assert(h1.at(0) == 0.25);
  assert(h1.at(1) == 0.50);
  assert(h1.at(2) == 0.75);
  assert(h1.at(3) == 1.00);
  assert(h1.at(4) == 0.00);

  // use any_of to check if any cell values are smaller than 0.1,
  auto b = std::any_of(ind.begin(), ind.end(), [](const auto& x) { return x < 0.1; });
  assert(b == false); // under- and overflow cells are zero, but skipped

  // find minimum element
  auto min_it = std::min_element(ind.begin(), ind.end());
  assert(*min_it == 0.25); // under- and overflow cells are skipped

  // find maximum element
  auto max_it = std::max_element(ind.begin(), ind.end());
  assert(*max_it == 1.0);

  // make second PDF
  auto h2 = make_histogram(axis::regular<>(4, 1.0, 4.0));
  h2.at(0) = 0.1;
  h2.at(1) = 0.3;
  h2.at(2) = 0.2;
  h2.at(3) = 0.4;

  // computing cosine similiarity: cos(theta) = A dot B / sqrt((A dot A) * (B dot B))
  auto ind2 = indexed(h2);
  const auto aa = std::inner_product(ind.begin(), ind.end(), ind.begin(), 0.0);
  const auto bb = std::inner_product(ind2.begin(), ind2.end(), ind2.begin(), 0.0);
  const auto ab = std::inner_product(ind.begin(), ind.end(), ind2.begin(), 0.0);
  const auto cos_sim = ab / std::sqrt(aa * bb);

  assert(std::abs(cos_sim - 0.967) < 1e-2);
}

It is easy to iterate over all histogram cells to compute the sum of cell values by hand or to use an algorithm from the standard library to do so, but it is not safe. The result may not be accurate or overflow, if the sum is represented by an integer type. The library provides boost::histogram::algorithm::sum as a safer, albeit slower, alternative. It uses the Neumaier algorithm in double precision for integers and floating point cells, and does the naive sum otherwise.

It is sometimes convenient to generate a high-dimensional histogram first and then extract smaller or lower-dimensional versions from it. Lower-dimensional histograms are obtained by summing the bin contents of the removed axes. This is called a projection. If the histogram has under- and overflow bins along all axes, this operation creates a histogram which is identical to one that would have been obtained by filling the original data.

Projection is useful if you found out that there is no interesting structure along an axis, so it is not worth keeping that axis around, or if you want to visualize 1d or 2d summaries of a high-dimensional histogram.

The library provides the boost::histogram::algorithm::project function for this purpose. It accepts the original histogram and the indices (one or more) of the axes that are kept and returns the lower-dimensional histogram. If the histogram is static, meaning the axis configuration is known at compile-time, compile-time numbers should be used as indices to obtain another static histogram. Using normal numbers or iterators over indices produces a histogram with a dynamic axis configuration.

#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;
  using namespace literals; // enables _c suffix

  // make a 2d histogram
  auto h = make_histogram(axis::regular<>(3, -1.0, 1.0), axis::integer<>(0, 2));

  h(-0.9, 0);
  h(0.9, 1);
  h(0.1, 0);

  auto hr0 = algorithm::project(h, 0_c); // keep only first axis
  auto hr1 = algorithm::project(h, 1_c); // keep only second axis

  // reduce does not remove counts; returned histograms are summed over
  // the removed axes, so h, hr0, and hr1 have same number of total counts;
  // we compute the sum of counts with the sum algorithm
  assert(algorithm::sum(h) == 3 && algorithm::sum(hr0) == 3 && algorithm::sum(hr1) == 3);

  std::ostringstream os1;
  for (auto&& x : indexed(h))
    os1 << "(" << x.index(0) << ", " << x.index(1) << "): " << *x << "\n";
  std::cout << os1.str() << std::flush;
  assert(os1.str() == "(0, 0): 1\n"
                      "(1, 0): 1\n"
                      "(2, 0): 0\n"
                      "(0, 1): 0\n"
                      "(1, 1): 0\n"
                      "(2, 1): 1\n");

  std::ostringstream os2;
  for (auto&& x : indexed(hr0)) os2 << "(" << x.index(0) << ", -): " << *x << "\n";
  std::cout << os2.str() << std::flush;
  assert(os2.str() == "(0, -): 1\n"
                      "(1, -): 1\n"
                      "(2, -): 1\n");

  std::ostringstream os3;
  for (auto&& x : indexed(hr1)) os3 << "(- ," << x.index(0) << "): " << *x << "\n";
  std::cout << os3.str() << std::flush;
  assert(os3.str() == "(- ,0): 2\n"
                      "(- ,1): 1\n");
}

A projection removes an axis completely. A less drastic way to obtain a smaller histogram is the reduce function, which allows one to slice, shrink or rebin individual axes.

Shrinking means that the value range of an axis is reduced and the number of bins along that axis. Slicing does the same, but is based on axis indices while shrinking is based on the axis values. To rebin means that adjacent bins are merged into larger bins, the histogram is made coarser. For N adjacent bins, a new bin is formed which covers the common interval of the merged bins and has their added content. These two operations can be combined and applied to several axes at once. Doing it in one step is much more efficient than doing it in several steps.

The reduce function does not change the total count if all modified axes in the histogram have underflow and overflow bins. Counts in removed bins are added to the corresponding under- and overflow bins. As in case of the project function, such a histogram is guaranteed to be identical to one obtained from filling the original data.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;
  // import reduce commands into local namespace to save typing
  using algorithm::rebin;
  using algorithm::shrink;
  using algorithm::slice;

  // make a 2d histogram
  auto h = make_histogram(axis::regular<>(4, 0.0, 4.0), axis::regular<>(4, -2.0, 2.0));

  h(0, -0.9);
  h(1, 0.9);
  h(2, 0.1);
  h(3, 0.1);

  // reduce takes positional commands which are applied to the axes in order
  // - shrink is applied to the first axis; the new axis range is 0.0 to 3.0
  // - rebin is applied to the second axis; pairs of adjacent bins are merged
  auto h2 = algorithm::reduce(h, shrink(0.0, 3.0), rebin(2));

  assert(h2.axis(0) == axis::regular<>(3, 0.0, 3.0));
  assert(h2.axis(1) == axis::regular<>(2, -2.0, 2.0));

  // reduce does not change the total count if the histogram has underflow/overflow bins
  assert(algorithm::sum(h) == 4 && algorithm::sum(h2) == 4);

  // One can also explicitly specify the index of the axis in the histogram on which the
  // command should act, by using this index as the the first parameter. The position of
  // the command in the argument list of reduce is then ignored. We use this to slice only
  // the second axis (axis has index 1 in the histogram) from bin index 2 to 4.
  auto h3 = algorithm::reduce(h, slice(1, 2, 4));

  assert(h3.axis(0) == h.axis(0)); // unchanged
  assert(h3.axis(1) == axis::regular<>(2, 0.0, 2.0));
}

Simple streaming operators are shipped with the library. They are internally used by the unit tests and give simple text representations of axis and histogram configurations and show the histogram content. One-dimensional histograms are rendered as ASCII drawings. The text representations may be useful for debugging or more, but users may want to use their own implementations. Therefore, the headers with the builtin implementations are not included by any other header of the library. The following example shows the effect of output streaming.

#include <boost/histogram.hpp>
#include <boost/histogram/ostream.hpp>
#include <cassert>
#include <iostream>
#include <sstream>
#include <string>

int main() {
  using namespace boost::histogram;

  std::ostringstream os;

  // width of histogram can be set like this; if it is not set, the library attempts to
  // determine the terminal width and choses the histogram width accordingly
  os.width(78);

  auto h1 = make_histogram(axis::regular<>(5, -1.0, 1.0, "axis 1"));
  h1.at(0) = 2;
  h1.at(1) = 4;
  h1.at(2) = 3;
  h1.at(4) = 1;

  // 1D histograms are rendered as an ASCII drawing
  os << h1;

  auto h2 = make_histogram(axis::regular<>(2, -1.0, 1.0, "axis 1"),
                           axis::category<std::string>({"red", "blue"}, "axis 2"));

  // higher dimensional histograms just have their cell counts listed
  os << h2;

  std::cout << os.str() << std::endl;

  assert(
      os.str() ==
      "histogram(regular(5, -1, 1, metadata=\"axis 1\", options=underflow | overflow))\n"
      "               ┌─────────────────────────────────────────────────────────────┐\n"
      "[-inf,   -1) 0 │                                                             │\n"
      "[  -1, -0.6) 2 │██████████████████████████████                               │\n"
      "[-0.6, -0.2) 4 │████████████████████████████████████████████████████████████ │\n"
      "[-0.2,  0.2) 3 │█████████████████████████████████████████████                │\n"
      "[ 0.2,  0.6) 0 │                                                             │\n"
      "[ 0.6,    1) 1 │███████████████                                              │\n"
      "[   1,  inf) 0 │                                                             │\n"
      "               └─────────────────────────────────────────────────────────────┘\n"
      "histogram(\n"
      "  regular(2, -1, 1, metadata=\"axis 1\", options=underflow | overflow)\n"
      "  category(\"red\", \"blue\", metadata=\"axis 2\", options=overflow)\n"
      "  (-1 0): 0 ( 0 0): 0 ( 1 0): 0 ( 2 0): 0 (-1 1): 0 ( 0 1): 0\n"
      "  ( 1 1): 0 ( 2 1): 0 (-1 2): 0 ( 0 2): 0 ( 1 2): 0 ( 2 2): 0\n"
      ")");
}

The library supports serialization via Boost.Serialization. The serialization code is not included by the super header #include <boost/histogram.hpp>, so that the library can be used without Boost.Serialization or with another compatible serialization library.

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/histogram.hpp>
#include <boost/histogram/serialization.hpp> // includes serialization code
#include <cassert>
#include <sstream>

int main() {
  using namespace boost::histogram;

  auto a = make_histogram(axis::regular<>(3, -1.0, 1.0, "axis 0"),
                          axis::integer<>(0, 2, "axis 1"));
  a(0.5, 1);

  std::string buf; // to hold persistent representation

  // store histogram
  {
    std::ostringstream os;
    boost::archive::text_oarchive oa(os);
    oa << a;
    buf = os.str();
  }

  auto b = decltype(a)(); // create a default-constructed second histogram

  assert(b != a); // b is empty, a is not

  // load histogram
  {
    std::istringstream is(buf);
    boost::archive::text_iarchive ia(is);
    ia >> b;
  }

  assert(b == a); // now b is equal to a
}

Letting the compiler deduce the histogram type is recommended, because the templated type is tedious to write down explicitly. Functions or methods which accept or return histograms should be templated to work with all kinds of histograms. It is also possible to write templated versions which accept only histograms with dynamic axes or only histograms with static axes. The following example demonstrates all this.

#include <boost/histogram.hpp>
#include <cassert>

// function accepts any histogram and returns a copy
template <class Axes, class Storage>
boost::histogram::histogram<Axes, Storage> any_histogram(
    boost::histogram::histogram<Axes, Storage>& h) {
  return h;
}

// function only accepts histograms with fixed axis types and returns a copy
template <class Storage, class... Axes>
boost::histogram::histogram<std::tuple<Axes...>, Storage> only_static_histogram(
    boost::histogram::histogram<std::tuple<Axes...>, Storage>& h) {
  return h;
}

// function only accepts histograms with dynamic axis types and returns a copy
template <class Storage, class... Axes>
boost::histogram::histogram<std::vector<boost::histogram::axis::variant<Axes...>>,
                            Storage>
only_dynamic_histogram(
    boost::histogram::histogram<std::vector<boost::histogram::axis::variant<Axes...>>,
                                Storage>& h) {
  return h;
}

int main() {
  using namespace boost::histogram;

  auto histogram_with_static_axes = make_histogram(axis::regular<>(10, 0, 1));

  using axis_variant = axis::variant<axis::regular<>, axis::integer<>>;
  std::vector<axis_variant> axes;
  axes.emplace_back(axis::regular<>(5, 0, 1));
  axes.emplace_back(axis::integer<>(0, 1));
  auto histogram_with_dynamic_axes = make_histogram(axes);

  assert(any_histogram(histogram_with_static_axes) == histogram_with_static_axes);
  assert(any_histogram(histogram_with_dynamic_axes) == histogram_with_dynamic_axes);
  assert(only_static_histogram(histogram_with_static_axes) == histogram_with_static_axes);
  assert(only_dynamic_histogram(histogram_with_dynamic_axes) ==
         histogram_with_dynamic_axes);

  // does not compile: only_static_histogram(histogram_with_dynamic_axes)
  // does not compile: only_dynamic_histogram(histogram_with_static_axes)
}

If the histogram type has to be written down explicitly, the types are constructed as follows. In all cases, the default_storage type argument may be replaced by any other storage type or omitted entirely.

  • Histogram with fixed axis types:

    boost::histogram::histogram<
      std::tuple<axis_type_1, axis_type_2, ..., axis_type_N>
      , boost::histogram::default_storage
    >
    
  • Histogram with a variable number of the same axis type:

    boost::histogram::histogram<
      std::vector<
        axis_type_1
      >
      , boost::histogram::default_storage
    >
    
  • Histogram with variable axis types:

    boost::histogram::histogram<
      std::vector<
        boost::histogram::axis::variant<
          axis_type_1, axis_type_2, ..., axis_type_N
        >
      >
      , boost::histogram::default_storage
    >
    

The library is customisable and extensible by users. Users can create new axis types and use them with the histogram, or implement a custom storage policy, or use a builtin storage policy with a custom counter type. The library was designed to make this very easy. This section shows how to do this.

It is easy to make custom axis classes that work with the library. The custom axis class must meet the requirements of the Axis concept.

Users can create a custom axis by derive from a builtin axis type and customize its behavior. The following examples demonstrates a modification of the integer axis.

#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;

  // custom axis, which adapts builtin integer axis
  struct custom_axis : public axis::integer<> {
    using value_type = const char*; // type that is fed to the axis

    using integer::integer; // inherit ctors of base

    // the customization point
    // - accept const char* and convert to int
    // - then call index method of base class
    axis::index_type index(value_type s) const { return integer::index(std::atoi(s)); }
  };

  auto h = make_histogram(custom_axis(3, 6));
  h("-10");
  h("3");
  h("4");
  h("9");

  std::ostringstream os;
  for (auto&& b : indexed(h)) {
    os << "bin " << b.index() << " [" << b.bin() << "] " << *b << "\n";
  }

  std::cout << os.str() << std::endl;

  assert(os.str() == "bin 0 [3] 1\n"
                     "bin 1 [4] 1\n"
                     "bin 2 [5] 0\n");
}

How to make an axis completely from scratch is shown in the next example.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // stateless axis which returns 1 if the input is even and 0 otherwise
  struct even_odd_axis {
    axis::index_type index(int x) const { return x % 2; }
    axis::index_type size() const { return 2; }
  };

  // threshold axis which returns 1 if the input is above threshold
  struct threshold_axis {
    threshold_axis(double x) : thr(x) {}
    axis::index_type index(double x) const { return x >= thr; }
    axis::index_type size() const { return 2; }
    double thr;
  };

  auto h = make_histogram(even_odd_axis(), threshold_axis(3.0));

  h(0, 2.0);
  h(1, 4.0);
  h(2, 4.0);

  assert(h.at(0, 0) == 1); // even, below threshold
  assert(h.at(0, 1) == 1); // even, above threshold
  assert(h.at(1, 0) == 0); // odd, below threshold
  assert(h.at(1, 1) == 1); // odd, above threshold
}

Such minimal axis types lack many features provided by the builtin axis types, for example, one cannot iterate over them, but this functionality can be added as needed.

Multi-dimensional histograms usually have an orthogonal system of axes. Orthogonal means that each axis takes care of only one value and computes its local index independently of all the other axes and values. A checker-board is such an orthogonal grid in 2D.

There are other interesting grids which are not orthogonal, notably the honeycomb grid. In such a grid, each cell is hexagonal and even though the cells form a perfectly regular pattern, it is not possible to sort values into these cells using two orthogonal axes.

The library supports non-orthogonal grids by allowing axis types to accept a std::tuple of values. The axis can compute an index from the values passed to it in an arbitrary way. The following example demonstrates this.

#include <boost/histogram.hpp>
#include <cassert>

int main() {
  using namespace boost::histogram;

  // axis which returns 1 if the input falls inside the unit circle and zero otherwise
  struct circle_axis {
    // accepts a 2D point in form of a std::tuple
    axis::index_type index(const std::tuple<double, double>& point) const {
      const auto x = std::get<0>(point);
      const auto y = std::get<1>(point);
      return x * x + y * y <= 1.0;
    }

    axis::index_type size() const { return 2; }
  };

  auto h1 = make_histogram(circle_axis());

  // fill looks normal for a histogram which has only one Nd-axis
  h1(0, 0);   // in
  h1(0, -1);  // in
  h1(0, 1);   // in
  h1(-1, 0);  // in
  h1(1, 0);   // in
  h1(1, 1);   // out
  h1(-1, -1); // out

  // 2D histogram, but only 1D index
  assert(h1.at(0) == 2); // out
  assert(h1.at(1) == 5); // in

  // other axes can be combined with a Nd-axis
  auto h2 = make_histogram(circle_axis(), axis::category<std::string>({"red", "blue"}));

  // now we need to pass arguments for Nd-axis explicitly as std::tuple
  h2(std::make_tuple(0, 0), "red");
  h2(std::make_tuple(1, 1), "blue");

  // 3D histogram, but only 2D index
  assert(h2.at(0, 0) == 0); // out, red
  assert(h2.at(0, 1) == 1); // out, blue
  assert(h2.at(1, 0) == 1); // in, red
  assert(h2.at(1, 1) == 0); // in, blue
}

Histograms which use a different storage class can easily created with the factory function make_histogram_with. For convenience, this factory function accepts many standard containers as storage backends: vectors, arrays, and maps. These are automatically wrapped with a boost::histogram::storage_adaptor to provide the storage interface needed by the library. Users may also place custom accumulators in the vector, as described in the next section.

[Warning] Warning

The no-overflow-guarantee is only valid if the unlimited_storage (the default) is used. If you change the storage policy, you need to know what you are doing.

A std::vector may provide higher performance than the unlimited_storage with a carefully chosen counter type. Usually, this would be an integral or floating point type. A std::vector-based storage may be faster for low-dimensional histograms (or not, you need to measure).

Users who work exclusively with weighted histograms should chose a std::vector<double>, it will be faster. If they also want to track the variance of the sum of weights, a vector-based storage of weighted_sum accumulators should be used. The factory function make_weighted_histogram is a convenient way to generate a histogram with this storage.

An interesting alternative to a std::vector is to use a std::array. The latter provides a storage with a fixed maximum capacity (the size of the array). std::array allocates the memory on the stack. In combination with a static axis configuration this allows one to create histograms completely on the stack without any dynamic memory allocation. Small stack-based histograms can be created and destroyed very fast.

Finally, a std::map or std::unordered_map or any other map type that implements the STL interface can be used to generate a histogram with a sparse storage, where empty cells do not consume any memory. This sounds attractive, but the memory consumption per cell in such a data structure is much larger than for a vector or array, so the number of empty cells must be substantial to gain. Moreover, cell lookup in a sparse data structure may be less performant. Whether a sparse storage performs better than a dense storage depends on the use case. The library makes it easy to switch from dense to sparse storage and back, so users are invited to test both options.

The following example shows how histograms are constructed which use an alternative storage classes.

#include <algorithm> // std::for_each
#include <array>
#include <boost/histogram.hpp>
#include <boost/histogram/algorithm/sum.hpp>
#include <functional> // std::ref
#include <unordered_map>
#include <vector>

int main() {
  using namespace boost::histogram;
  const auto axis = axis::regular<>(10, 0.0, 1.0);

  auto data = {0.1, 0.3, 0.2, 0.7};

  // Create static histogram with vector<int> as counter storage, you can use
  // other arithmetic types as counters, e.g. double.
  auto h1 = make_histogram_with(std::vector<int>(), axis);
  std::for_each(data.begin(), data.end(), std::ref(h1));
  assert(algorithm::sum(h1) == 4);

  // Create static histogram with array<int, N> as counter storage which is
  // allocated completely on the stack (this is very fast). N may be larger than
  // the actual number of bins used; an exception is raised if N is too small to
  // hold all bins.
  auto h2 = make_histogram_with(std::array<int, 12>(), axis);
  std::for_each(data.begin(), data.end(), std::ref(h2));
  assert(algorithm::sum(h2) == 4);

  // Create static histogram with unordered_map as counter storage; this
  // generates a sparse histogram where only memory is allocated for bins that
  // are non-zero. This sounds like a good idea for high-dimensional histograms,
  // but maps come with a memory and run-time overhead. The default_storage
  // usually performs better in high dimensions.
  auto h3 = make_histogram_with(std::unordered_map<std::size_t, int>(), axis);
  std::for_each(data.begin(), data.end(), std::ref(h3));
  assert(algorithm::sum(h3) == 4);
}

There are two ways to generate a single histogram using several threads.

1. Each thread has its own copy of the histogram. Each copy is independently filled. The copies are then added in the main thread. Use this as the default when you can afford having N copies of the histogram in memory for N threads, because it allows each thread to work on its thread-local memory and utilise the CPU cache without the need to synchronise memory access. The highest performance gains are obtained in this way.

2. There is only one histogram which is filled concurrently by several threads. This requires using a thread-safe storage that can handle concurrent writes. The library provides the boost::histogram::accumulators::count accumulator with a thread-safe option, which combined with the boost::histogram::dense_storage provides a thread-safe storage.

[Note] Note

Filling a histogram with growing axes in a multi-threaded environment is safe, but has poor performance since the histogram must be locked on each fill. The locks are required because an axis could grow each time, which changes the number of cells and cell addressing for all other threads. Even without growing axes, there is only a performance gain if the histogram is either very large or when significant time is spend in preparing the value to fill. For small histograms, threads frequently access the same cell, whose state has to be synchronised between the threads. This is slow even with atomic counters and made worse by the effect of false sharing.

The next example demonstrates option 2 (option 1 is straight-forward to implement).

#include <boost/histogram.hpp>
#include <boost/histogram/algorithm/sum.hpp>
#include <cassert>
#include <functional>
#include <thread>
#include <vector>

// dummy fill function, to be executed in parallel by several threads
template <typename Histogram>
void fill(Histogram& h) {
  for (unsigned i = 0; i < 1000; ++i) { h(i % 10); }
}

int main() {
  using namespace boost::histogram;

  /*
    Create histogram with container of thread-safe counters for parallel filling in
    several threads. Only filling is thread-safe, other guarantees are not given.
  */
  auto h = make_histogram_with(dense_storage<accumulators::count<unsigned, true>>(),
                               axis::integer<>(0, 10));

  /*
    Run the fill function in parallel from different threads. This is safe when a
    thread-safe accumulator and a storage with thread-safe cell access are used.
  */
  auto fill_h = [&h]() { fill(h); };
  std::thread t1(fill_h);
  std::thread t2(fill_h);
  std::thread t3(fill_h);
  std::thread t4(fill_h);
  t1.join();
  t2.join();
  t3.join();
  t4.join();

  // Without a thread-safe accumulator, this number may be smaller.
  assert(algorithm::sum(h) == 4000);
}

A storage can hold custom accumulators which can accept an arbitrary number of arguments. The arguments are passed to the accumulator via the sample call, for example, sample(1, 2, 3) for an accumulator which accepts three arguments. Custom accumulators can be combined with any container supported by boost::histogram::storage_adaptor. For convenience, the alias template boost::histogram::dense_storage is provided to make a standard storage with a custom accumulator type.

The library provides several accumulators:

  • count accepts no samples, but accepts a weight. It is an alternative to a plain arithmetic type as a counter. It has two benefits. Firstly, users can derive from count to attach arbitrary metadata to each bin. Secondly, count has a template option to make increments atomic and thus thread-safe, which then allows one to fill a histogram concurrently without locking, if the storage also supports this (the dense_storage does, for example).
  • sum accepts no samples, but accepts a weight. It is an alternative to a plain arithmetic type as a counter. It provides an advantage when histograms are filled with weights that differ dramatically in magnitude. The sum of weights is computed incrementally with the Neumaier algorithm. The algorithm is more accurate, but consumes more CPU and memory (memory is doubled compared to a normal sum of floating point numbers).
  • weighted_sum accepts no samples, but accepts a weight. It computes the sum of weights and the sum of weights squared, the variance estimate of the sum of weights. This type is used by the make_weighted_histogram.
  • mean accepts a sample and computes the mean of the samples. make_profile uses this accumulator.
  • weighted_mean accepts a sample and a weight. It computes the weighted mean of the samples. make_weighted_profile uses this accumulator.
  • fraction accepts a boolean sample that represents success or failure of a binomial trial. It computes the fraction of successes. One can access the number of successes and failures, the fraction, the estimated variance of the fraction, and a confidence interval. The standard confidence interval is the Wilson score interval, but more interval computers are implemented in boost/histogram/utility. Beware: one cannot pass std::vector<bool> to histogram::fill, because it is not a contiguous sequence of boolean values, but any other container of booleans works and any sequence of values convertible to bool.

Users can easily write their own accumulators and plug them into the histogram, if they adhere to the Accumulator concept. All accumulators from Boost.Accumulators that accept a single argument and no weights work out of the box. Other accumulators from Boost.Accumulators can be made to work by using them inside a wrapper class that implements the concept.

The first example shows how to make and use a histogram that uses one of the the builtin accumulators.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;
  using mean = accumulators::mean<>;

  // Create a 1D-profile, which computes the mean of samples in each bin.
  auto h = make_histogram_with(dense_storage<mean>(), axis::integer<>(0, 2));
  // The factory function `make_profile` is provided as a shorthand for this, so this is
  // equivalent to the previous line: auto h = make_profile(axis::integer<>(0, 2));

  // An argument marked as `sample` is passed to the accumulator.
  h(0, sample(1)); // sample goes to first cell
  h(0, sample(2)); // sample goes to first cell
  h(1, sample(3)); // sample goes to second cell
  h(1, sample(4)); // sample goes to second cell

  std::ostringstream os;
  for (auto&& x : indexed(h)) {
    // Accumulators usually have methods to access their state. Use the arrow
    // operator to access them. Here, `count()` gives the number of samples,
    // `value()` the mean, and `variance()` the variance estimate of the mean.
    os << boost::format("index %i count %i mean %.1f variance %.1f\n") % x.index() %
              x->count() % x->value() % x->variance();
  }
  std::cout << os.str() << std::flush;
  assert(os.str() == "index 0 count 2 mean 1.5 variance 0.5\n"
                     "index 1 count 2 mean 3.5 variance 0.5\n");
}

The simplest way to make a custom accumulator is to inherit from one of the builtin accumulators. The following example shows how to add arbitrary metadata to each histogram cell by inheriting a custom accumulator from a builtin accumulator.

#include <boost/histogram.hpp>
#include <cmath>
#include <iostream>
#include <sstream>
#include <string>

int main() {
  using namespace boost::histogram;

  // derive custom accumulator from one of the builtins
  struct accumulator_with_metadata : accumulators::count<> {
    std::string meta; // custom meta data

    // arbitrary additional data and interface could be added here
  };

  // make 1D histogram with custom accmulator
  auto h = make_histogram_with(dense_storage<accumulator_with_metadata>(),
                               axis::integer<>(1, 4));

  // fill some weighted entries
  auto x = {1, 0, 2, 1};
  h.fill(x);

  // assigning meta data to two bins
  h[0].meta = "Foo";
  h[2].meta = "Bar";

  std::ostringstream os;
  for (auto&& x : indexed(h))
    os << x.bin() << " value " << x->value() << " meta " << x->meta << "\n";

  std::cout << os.str() << std::flush;
  assert(os.str() == "1 value 2 meta Foo\n"
                     "2 value 1 meta \n"
                     "3 value 0 meta Bar\n");
}

The next example shows how to making a custom accumulators completely from scratch. The library was designed to make this as easy as possible.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <cassert>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;

  // A custom accumulator which tracks the maximum of the samples.
  // It must have a call operator that accepts the argument of the `sample` function.
  struct maximum {
    // return value is ignored, so we use void
    void operator()(double x) {
      if (x > value) value = x;
    }
    double value = 0; // value is public and initialized to zero
  };

  // Create 1D histogram that uses the custom accumulator.
  auto h = make_histogram_with(dense_storage<maximum>(), axis::integer<>(0, 2));
  h(0, sample(1.0)); // sample goes to first cell
  h(0, sample(2.0)); // sample goes to first cell
  h(1, sample(3.0)); // sample goes to second cell
  h(1, sample(4.0)); // sample goes to second cell

  std::ostringstream os;
  for (auto&& x : indexed(h)) {
    os << boost::format("index %i maximum %.1f\n") % x.index() % x->value;
  }
  std::cout << os.str() << std::flush;
  assert(os.str() == "index 0 maximum 2.0\n"
                     "index 1 maximum 4.0\n");
}

The next example shows a more complex custom accumulator that accepts two samples at once and an optional weight. It independently computes the mean for each sample. This is more efficient than filling two separate profiles, because the cell lookup has to be done only once.

#include <boost/format.hpp>
#include <boost/histogram.hpp>
#include <iostream>
#include <sstream>

int main() {
  using namespace boost::histogram;

  // Accumulator accepts two samples and an optional weight and computes the mean of each.
  struct multi_mean {
    accumulators::mean<> mx, my;

    // called when no weight is passed
    void operator()(double x, double y) {
      mx(x);
      my(y);
    }

    // called when a weight is passed
    void operator()(weight_type<double> w, double x, double y) {
      mx(w, x);
      my(w, y);
    }
  };
  // Note: The implementation can be made more efficient by sharing the sum of weights.

  // Create a 1D histogram that uses the custom accumulator.
  auto h = make_histogram_with(dense_storage<multi_mean>(), axis::integer<>(0, 2));
  h(0, sample(1, 2));            // samples go to first cell
  h(0, sample(3, 4));            // samples go to first cell
  h(1, sample(5, 6), weight(2)); // samples go to second cell
  h(1, sample(7, 8), weight(3)); // samples go to second cell

  std::ostringstream os;
  for (auto&& bin : indexed(h)) {
    os << boost::format("index %i mean-x %.1f mean-y %.1f\n") % bin.index() %
              bin->mx.value() % bin->my.value();
  }
  std::cout << os.str() << std::flush;
  assert(os.str() == "index 0 mean-x 2.0 mean-y 3.0\n"
                     "index 1 mean-x 6.2 mean-y 7.2\n");
}

And finally, just for fun, we use a histogram as the accumulator for another histogram.

#include <boost/histogram.hpp>
#include <cmath>
#include <iostream>
#include <sstream>
#include <string>

int main() {
  using namespace boost::histogram;

  // First we define the nested histogram type.
  using axis_t = axis::category<int, axis::null_type, axis::option::growth_t>;
  using base_t = histogram<std::tuple<axis_t>>;

  // Now we make an accumulator out of it by using inheritance.
  // We only need to implement operator(). A matching version of operator() is actually
  // present in base_t, but it is templated and this is not allowed by the accumulator
  // concept. Initialization could also happen here. We don't need to initialize anything
  // here, because the default constructor of base_t is called automatically and
  // sufficient for this example.
  struct hist_t : base_t {
    void operator()(const double x) { base_t::operator()(x); }
  };

  auto h = make_histogram_with(dense_storage<hist_t>(), axis::integer<>(1, 4));

  auto x = {1, 1, 2, 2};
  auto s = {1, 2, 3, 3}; // samples are filled into the nested histograms
  h.fill(x, sample(s));

  std::ostringstream os;
  for (auto&& x : indexed(h)) {
    os << x.bin() << " ";
    for (auto&& y : indexed(*x)) { os << "(" << y.bin() << ": " << *y << ") "; }
    os << "\n";
  }

  std::cout << os.str() << std::flush;
  assert(os.str() == "1 (1: 1) (2: 1) \n"
                     "2 (3: 2) \n"
                     "3 \n");
}

Note that the axis size of the nested histogram differs from bin to bin. Creating a 2D histogram in this way is not as efficient as the normal way, but it allows one to create a histograms with such a non-rectangular layout of cells.


PrevUpHomeNext