...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Boost.Histogram is designed to make simple things simple, yet complex things possible. Correspondingly, this guides covers the basic usage first, and the advanced usage in later sections. For an alternative quick start guide, have a look at the Getting started section.
A histogram consists of a collection of axis objects and a storage. The storage holds a collection of accumulators, one for each cell. The axis objects maps input values to indices, which are combined into a global index that is used to look up the cell in the storage.
To start off you do not have to worry about the storage, the library provides
a good default. Learning more about the many interesting axis types to choose
from, however, will pay off very quickly (which are discussed further below).
For now, let us stick to the most common axis, the regular
axis. It represents equidistant intervals on the real line.
Histograms are created with the convenient factory function make_histogram
.
The following example shows how to make a histogram with a single axis.
#include <boost/histogram.hpp> #include <cassert> int main() { using namespace boost::histogram; // create a 1d-histogram in default configuration which // covers the real line from -1 to 1 in 100 bins auto h = make_histogram(axis::regular<>(100, -1.0, 1.0)); // rank is the number of axes assert(h.rank() == 1); }
An axis object defines how input values are mapped to bins, it is a mapping functor of input values to indices. The axis object holds information such as how many bins there are, where the bin edges are, metadata about the axis and so on. The rank of a histogram is given by the number of axes. A histogram with one axis is one-dimensional. If you provide two, it is two-dimensional, and so on.
In the example above, the compiler knows the number of axes and their type
at compile-time, the information can be deduced from the arguments to make_histogram
. This
gives the best performance, but sometimes you only know the axis configuration
at run-time, because it depends on information that's only available at run-time.
For that case you can also create axes at run-time and pass them to an overload
of the make_histogram
function. Here is an example.
#include <boost/histogram.hpp> #include <cassert> #include <sstream> #include <vector> const char* config = "4 1.0 2.0\n" "5 3.0 4.0\n"; int main() { using namespace boost::histogram; // read axis config from a config file (mocked here with std::istringstream) // and create vector of regular axes, the number of axis is not known at compile-time std::istringstream is(config); auto v1 = std::vector<axis::regular<>>(); while (is.good()) { unsigned bins; double start, stop; is >> bins >> start >> stop; v1.emplace_back(bins, start, stop); } // create histogram from iterator range // (copying or moving the vector also works, move is shown below) auto h1 = make_histogram(v1.begin(), v1.end()); assert(h1.rank() == v1.size()); // with a vector of axis::variant (polymorphic axis type that can hold any one of the // template arguments at a time) the types and number of axis can vary at run-time auto v2 = std::vector<axis::variant<axis::regular<>, axis::integer<>>>(); v2.emplace_back(axis::regular<>(100, -1.0, 1.0)); v2.emplace_back(axis::integer<>(1, 7)); // create dynamic histogram by moving the vector auto h2 = make_histogram(std::move(v2)); assert(h2.rank() == 2); }
Note | |
---|---|
When the axis types are known at compile-time, the histogram internally
holds them in a |
The factory function named make_histogram
uses the default storage type, which provides safe counting, is fast, and
memory efficient. If you want to create a histogram with another storage
type, use make_histogram_with
.
To learn more about other storage types and how to create your own, have
a look at the section Advanced Usage.
The library provides a number of useful axis types. The builtin axis types can be configured to fit many needs. If you still need something more exotic, no problem, it is easy to write your own axis types, see the Advanced usage section for details. In the following, we give some advice when to use which of the builtin axis types.
regular
Axis over intervals on the real line which have equal width. Value-to-index
conversion is O(1) and very fast. The axis does not allocate memory
dynamically. The axis is very flexible thanks to transforms (see
below). Due to finite precision of floating point calculations, bin
edges may not be exactly at expected values, though. If you need
bin edges at exactly defined floating point values, use the variable
axis.
If you need bins at exact consecutive integral values, use the integer
axis.
variable
Axis over intervals on the real line of variable width. Value-to-index conversion is O(log(N)). The axis allocates memory dynamically to store the bin edges. Use this if the regular axis with transforms cannot represent the binning you want. If you need bin edges at exactly defined floating point values, use this axis.
integer
Axis over an integer sequence [i, i+1, i+2, ...]. It can be configured
to handle real input values, too, and then acts like a fast regular
axis with a fixed bin width of 1. Value-to-index conversion is O(1)
and faster than for the regular
axis. Does not allocate memory dynamically. Use this when your input
consists of an integer range or pre-digitized values with low dynamic
range, like pixel values for individual colours in an image.
boolean
Axis over the two values [false, true]. It is a common specialization
of the regular
axis. Value-to-index conversion is a pass-through operation, so this
is the fastest possible axis. The axis has no state other than the
metadata (which can be stateless). Does not allocate memory dynamically.
Use this when your input consists of binary categories, like signal
and background.
category
Axis over a set of unique values of an arbitrary equal-comparable
type. Value-to-index conversion is O(N), but still faster than the
O(logN) complexity of the variable
axis for N < 10, the typical use case. The axis allocates memory
from the heap to store the values.
Here is an example which shows the basic use case for each axis type.
#include <boost/histogram/axis.hpp> #include <limits> int main() { using namespace boost::histogram; // make a regular axis with 10 bins over interval from 1.5 to 2.5 auto r = axis::regular<>{10, 1.5, 2.5}; // `<>` is needed in C++14 because the axis is templated, // in C++17 you can do: auto r = axis::regular{10, 1.5, 2.5}; assert(r.size() == 10); // alternatively, you can define the step size with the `step` marker auto r_step = axis::regular<>{axis::step(0.1), 1.5, 2.5}; assert(r_step == r); // histogram uses the `index` method to convert values to indices // note: intervals of builtin axis types are always semi-open [a, b) assert(r.index(1.5) == 0); assert(r.index(1.6) == 1); assert(r.index(2.4) == 9); // index for a value below the start of the axis is always -1 assert(r.index(1.0) == -1); assert(r.index(-std::numeric_limits<double>::infinity()) == -1); // index for a value below the above the end of the axis is always `size()` assert(r.index(3.0) == 10); assert(r.index(std::numeric_limits<double>::infinity()) == 10); // index for not-a-number is also `size()` by convention assert(r.index(std::numeric_limits<double>::quiet_NaN()) == 10); // make a variable axis with 3 bins [-1.5, 0.1), [0.1, 0.3), [0.3, 10) auto v = axis::variable<>{-1.5, 0.1, 0.3, 10.}; assert(v.index(-2.0) == -1); assert(v.index(-1.5) == 0); assert(v.index(0.1) == 1); assert(v.index(0.3) == 2); assert(v.index(10) == 3); assert(v.index(20) == 3); // make an integer axis with 3 bins at -1, 0, 1 auto i = axis::integer<>{-1, 2}; assert(i.index(-2) == -1); assert(i.index(-1) == 0); assert(i.index(0) == 1); assert(i.index(1) == 2); assert(i.index(2) == 3); // make an integer axis called "foo" auto i_with_label = axis::integer<>{-1, 2, "foo"}; // all builtin axis types allow you to pass some optional metadata as the last // argument in the constructor; a string by default, but can be any copyable type // two axis do not compare equal if they differ in their metadata assert(i != i_with_label); // integer axis also work well with unscoped enums enum { red, blue }; auto i_for_enum = axis::integer<>{red, blue + 1}; assert(i_for_enum.index(red) == 0); assert(i_for_enum.index(blue) == 1); // make a category axis from a scoped enum and/or if the identifiers are not consecutive enum class Bar { red = 12, blue = 6 }; auto c = axis::category<Bar>{Bar::red, Bar::blue}; assert(c.index(Bar::red) == 0); assert(c.index(Bar::blue) == 1); // c.index(12) is a compile-time error, since the argument must be of type `Bar` // category axis can be created for any copyable and equal-comparable type auto c_str = axis::category<std::string>{"red", "blue"}; assert(c_str.index("red") == 0); assert(c_str.index("blue") == 1); }
Note | |
---|---|
All builtin axes over a continuous value range use semi-open intervals
in their constructors as a convention. As a mnemonic, think of iterator
ranges from |
As mentioned in the previous example, you can assign an optional label to any axis to keep track of what the axis is about. Assume you have census data and you want to investigate how yearly income correlates with age, you could do:
#include <boost/histogram.hpp> int main() { using namespace boost::histogram; // create a 2d-histogram with an "age" and an "income" axis auto h = make_histogram(axis::regular<>(20, 0.0, 100.0, "age in years"), axis::regular<>(20, 0.0, 100.0, "yearly income in Thousands")); // do something with h }
Without the metadata it would be difficult to see which axis was covering which quantity. Metadata is the only axis property that can be modified after construction by the user. Axis objects with different metadata do not compare equal.
By default, strings are used to store the metadata, but any type compatible with the Metadata concept can be used.
All builtin axis types have template arguments for customisation. All arguments have reasonable defaults so you can use empty brackets. If your compiler supports C++17, you can drop the brackets altogether. Suitable arguments are then deduced from the constructor call. The template arguments are in order:
The value type is the argument type of the index()
method. An argument passed to the
axis must be implicitly convertible to this type.
regular
axis)A class that implements a monotonic transform between the data space and the space in which the bins are equi-distant. Users can define their own transforms and use them with the axis.
The axis uses an instance this type to store metadata. It is a std::string
by default, but it can by
any copyable type. If you want to save a small amount of stack memory
per axis, you pass the empty boost::histogram::axis::null_type
here.
Compile-time options
for the axis. This is used to enable/disable under- and overflow
bins, to make an axis circular, or to enable dynamic growth of the
axis beyond the initial range.
variable
and category
axes)Allocator that is used to request memory dynamically to store values. If you don't know what an allocator is you can safely ignore this argument.
You can use the boost::histogram::use_default
tag type for any of these
options, except for the Value and Allocator, to use the library default.
Transforms are a way to customize a regular
axis. Transforms allow you to chose the faster stack-allocated regular
axis over the generic variable
axis
in some cases. The default is the identity transform
which does nothing. A common need is a regular binning in the logarithm
of the input value. This can be achieved with a log
transform. The follow example shows the builtin transforms.
#include <boost/histogram/axis/regular.hpp> #include <limits> int main() { using namespace boost::histogram; // make a regular axis with a log transform over [10, 100), [100, 1000), [1000, 10000) axis::regular<double, axis::transform::log> r_log{3, 10., 10000.}; // log transform: // - useful when values vary dramatically in magnitude, like brightness of stars // - edges are not exactly at 10, 100, 1000, because of finite floating point precision // - values >= 0 but smaller than the starting value of the axis are mapped to -1 // - values < 0 are mapped to `size()`, because the result of std::log(value) is NaN assert(r_log.index(10.1) == 0); assert(r_log.index(100.1) == 1); assert(r_log.index(1000.1) == 2); assert(r_log.index(1) == -1); assert(r_log.index(0) == -1); assert(r_log.index(-1) == 3); // make a regular axis with a sqrt transform over [4, 9), [9, 16), [16, 25) axis::regular<double, axis::transform::sqrt> r_sqrt{3, 4., 25.}; // sqrt transform: // - bin widths are more mildly increasing compared to log transform // - axis starting at value == 0 is ok, sqrt(0) == 0 unlike log transform // - values < 0 are mapped to `size()`, because the result of std::sqrt(value) is NaN assert(r_sqrt.index(0) == -1); assert(r_sqrt.index(4.1) == 0); assert(r_sqrt.index(9.1) == 1); assert(r_sqrt.index(16.1) == 2); assert(r_sqrt.index(25.1) == 3); assert(r_sqrt.index(-1) == 3); // make a regular axis with a power transform x^1/3 over [1, 8), [8, 27), [27, 64) using pow_trans = axis::transform::pow; axis::regular<double, pow_trans> r_pow(pow_trans{1. / 3.}, 3, 1., 64.); // pow transform: // - generalization of the sqrt transform // - starting the axis at value == 0 is ok for power p > 0, 0^p == 0 for p > 0 // - values < 0 are mapped to `size()` if power p is not a positive integer assert(r_pow.index(0) == -1); assert(r_pow.index(1.1) == 0); assert(r_pow.index(8.1) == 1); assert(r_pow.index(27.1) == 2); assert(r_pow.index(64.1) == 3); assert(r_pow.index(-1) == 3); }
Due to the finite precision of floating point calculations, the bin edges
of a transformed regular axis may not be exactly at the expected values.
If you need exact correspondence, use a variable
axis.
Users may write their own transforms and use them with the builtin regular
axis, by implementing
a type that matches the Transform concept.
Options
can be used to configure each axis type. The option flags are implemented
as tag types with the suffix _t
.
Each tag type has a corresponding value without the suffix. The values
have set semantics: You can compute the union with operator|
and the intersection with operator&
.
When you pass a single option flag to an axis as a template parameter,
use the tag type. When you need to pass the union of several options
to an axis as a template parameter, surround the union of option values
with a decltype
. Both ways
of passing options are shown in the following examples.
Under- and overflow bins
Under- and overflow bins are added automatically for most axis types. If you create an axis with 10 bins, the histogram will actually have 12 bins along that axis. The extra bins are very useful, as explained in the rationale. If the input cannot exceed the axis range or if you are really tight on memory, you can disable the extra bins. A demonstration:
#include <boost/histogram.hpp> #include <string> int main() { using namespace boost::histogram; // create a 1d-histogram over integer values from 1 to 6 auto h1 = make_histogram(axis::integer<int>(1, 7)); // axis has size 6... assert(h1.axis().size() == 6); // ... but histogram has size 8, because of overflow and underflow bins assert(h1.size() == 8); // create a 1d-histogram for throws of a six-sided die without extra bins, // since the values cannot be smaller than 1 or larger than 6 auto h2 = make_histogram(axis::integer<int, use_default, axis::option::none_t>(1, 7)); // now size of axis and histogram is equal assert(h2.axis().size() == 6); assert(h2.size() == 6); }
The category
axis by default comes only with an overflow bin, which counts all input
values that are not part of the initial set.
Circular axes
Each builtin axis except the category
axis can be made circular. This means that the axis is periodic at its
ends. This is useful if you want make a histogram over a polar angle.
Example:
#include <boost/histogram/axis.hpp> #include <limits> int main() { using namespace boost::histogram; // make a circular regular axis ... [0, 180), [180, 360), [0, 180) .... using opts = decltype(axis::option::overflow | axis::option::circular); auto r = axis::regular<double, use_default, use_default, opts>{2, 0., 360.}; assert(r.index(-180) == 1); assert(r.index(0) == 0); assert(r.index(180) == 1); assert(r.index(360) == 0); assert(r.index(540) == 1); assert(r.index(720) == 0); // special values are mapped to the overflow bin index assert(r.index(std::numeric_limits<double>::infinity()) == 2); assert(r.index(-std::numeric_limits<double>::infinity()) == 2); assert(r.index(std::numeric_limits<double>::quiet_NaN()) == 2); // since the regular axis is the most common circular axis, there exists an alias auto c = axis::circular<>{2, 0., 360.}; assert(r == c); // make a circular integer axis auto i = axis::integer<int, use_default, axis::option::circular_t>{1, 4}; assert(i.index(0) == 2); assert(i.index(1) == 0); assert(i.index(2) == 1); assert(i.index(3) == 2); assert(i.index(4) == 0); assert(i.index(5) == 1); }
A circular axis cannot have an underflow bin, passing both options together generates a compile-time error. Since the highest bin wraps around to the lowest bin, there is no possibility for overflow either. However, an overflow bin is still added by default if the value is a floating point type, to catch NaNs and infinities.
Growing axes
Each builtin axis has an option to make it grow beyond its initial range when a value outside of that range is passed to it, while the default behaviour is to count this value in the under- or overflow bins (or to discard it). Example:
#include <boost/histogram.hpp> #include <cassert> #include <iostream> int main() { using namespace boost::histogram; // make a growing regular axis // - it grows new bins with its constant bin width until the value is covered auto h1 = make_histogram(axis::regular<double, use_default, use_default, axis::option::growth_t>{2, 0., 1.}); // nothing special happens here h1(0.1); h1(0.9); // state: [0, 0.5): 1, [0.5, 1.0): 1 assert(h1.axis().size() == 2); assert(h1.axis().bin(0).lower() == 0.0); assert(h1.axis().bin(1).upper() == 1.0); // value below range: axis grows new bins until value is in range h1(-0.3); // state: [-0.5, 0.0): 1, [0, 0.5): 1, [0.5, 1.0): 1 assert(h1.axis().size() == 3); assert(h1.axis().bin(0).lower() == -0.5); assert(h1.axis().bin(2).upper() == 1.0); h1(1.9); // state: [-0.5, 0.0): 1, [0, 0.5): 1, [0.5, 1.0): 1, [1.0, 1.5): 0 [1.5, 2.0): 1 assert(h1.axis().size() == 5); assert(h1.axis().bin(0).lower() == -0.5); assert(h1.axis().bin(4).upper() == 2.0); // make a growing category axis (here strings) // - empty axis is allowed: very useful if categories are not known at the beginning auto h2 = make_histogram(axis::category<std::string, use_default, axis::option::growth_t>()); assert(h2.size() == 0); // histogram is empty h2("foo"); // new bin foo, index 0 assert(h2.size() == 1); h2("bar"); // new bin bar, index 1 assert(h2.size() == 2); h2("foo"); assert(h2.size() == 2); assert(h2[0] == 2); assert(h2[1] == 1); }
This feature can be very convenient, but keep two caveats in mind.
A growing axis can have under- and overflow bins, but these only count the special floating point values: positive and negative infinity, and NaN.
A histogram has been created and now you want to insert values. This is done
with the flexible call
operator
or the fill
method
, which you typically call in a loop. The call operator
accepts N
arguments or a
std::tuple
with N
elements, where N
is equal
to the number of axes of the histogram. It finds the corresponding bin for
the input and increments the bin counter by one. The fill
method
accepts a single iterable over other iterables (which
must have have elements contiguous in memory) or values, see the method documentation
for details.
After the histogram has been filled, use the at
method
(in analogy to std::vector::at
) to
access the cell values. It accepts integer indices, one for each axis of
the histogram.
#include <boost/histogram.hpp> #include <cassert> #include <functional> #include <numeric> #include <utility> #include <vector> int main() { using namespace boost::histogram; auto h = make_histogram(axis::integer<>(0, 3), axis::regular<>(2, 0.0, 1.0)); // fill histogram, number of arguments must be equal to number of axes, // types must be convertible to axis value type (here integer and double) h(0, 0.2); // increase a cell value by one h(2, 0.5); // increase another cell value by one // fills from a tuple are also supported; passing a tuple of wrong size // causes an error at compile-time or an assertion at runtime in debug mode auto xy = std::make_tuple(1, 0.3); h(xy); // chunk-wise filling is also supported and more efficient, make some data... std::vector<double> xy2[2] = {{0, 2, 5}, {0.8, 0.4, 0.7}}; // ... and call fill method h.fill(xy2); // once histogram is filled, access individual cells using operator[] or at(...) // - operator[] can only accept a single argument in the current version of C++, // it is convenient when you have a 1D histogram // - at(...) can accept several values, so use this by default // - underflow bins are at index -1, overflow bins at index `size()` // - passing an invalid index triggers a std::out_of_range exception assert(h.at(0, 0) == 1); assert(h.at(0, 1) == 1); assert(h.at(1, 0) == 1); assert(h.at(1, 1) == 0); assert(h.at(2, 0) == 1); assert(h.at(2, 1) == 1); assert(h.at(-1, -1) == 0); // underflow for axis 0 and 1 assert(h.at(-1, 0) == 0); // underflow for axis 0, normal bin for axis 1 assert(h.at(-1, 2) == 0); // underflow for axis 0, overflow for axis 1 assert(h.at(3, 1) == 1); // overflow for axis 0, normal bin for axis 1 // iteration over values works, but see next example for a better way // - iteration using begin() and end() includes under- and overflow bins // - iteration order is an implementation detail and should not be relied upon const double sum = std::accumulate(h.begin(), h.end(), 0.0); assert(sum == 6); }
For a histogram hist
, the
calls hist(weight(w), ...)
and hist(...,
weight(w))
increment
the bin counter by the value w
instead, where w
may be an
integer or floating point number. The helper function weight()
marks this argument as a weight, so that it can be distinguished from the
other inputs. It can be the first or last argument. You can freely mix calls
with and without a weight. Calls without weight
act like the weight is 1
. Why
weighted increments are sometimes useful is explained in
the rationale.
Note | |
---|---|
The default storage loses its no-overflow-guarantee when you pass floating point weights, but maintains it for integer weights. |
When the weights come from a stochastic process, it is useful to keep track
of the variance of the sum of weights per cell. A specialized histogram can
be generated with the make_weighted_histogram
factory function which does that.
#include <boost/histogram.hpp> #include <cassert> int main() { using namespace boost::histogram; // Create a histogram with weight counters that keep track of a variance estimate. auto h = make_weighted_histogram(axis::regular<>(3, 0.0, 1.0)); h(0.0, weight(1)); // weight 1 goes to first bin h(0.1, weight(2)); // weight 2 goes to first bin h(0.4, weight(3)); // weight 3 goes to second bin h(0.5, weight(4)); // weight 4 goes to second bin // chunk-wise filling is also supported auto x = {0.2, 0.6}; auto w = {5, 6}; h.fill(x, weight(w)); // Weight counters have methods to access the value (sum of weights) and the variance // (sum of weights squared, why this gives the variance is explained in the rationale) assert(h[0].value() == 1 + 2 + 5); assert(h[0].variance() == 1 * 1 + 2 * 2 + 5 * 5); assert(h[1].value() == 3 + 4 + 6); assert(h[1].variance() == 3 * 3 + 4 * 4 + 6 * 6); }
To iterate over all cells, the indexed
range generator is very convenient and also efficient. For almost all configurations,
the range generator iterates faster than a naive for-loop.
Under- and overflow are skipped by default.
#include <boost/format.hpp> #include <boost/histogram.hpp> #include <cassert> #include <iostream> #include <numeric> // for std::accumulate #include <sstream> using namespace boost::histogram; int main() { // make histogram with 2 x 2 = 4 bins (not counting under-/overflow bins) auto h = make_histogram(axis::regular<>(2, -1.0, 1.0), axis::regular<>(2, 2.0, 4.0)); h(weight(1), -0.5, 2.5); // bin index 0, 0 h(weight(2), -0.5, 3.5); // bin index 0, 1 h(weight(3), 0.5, 2.5); // bin index 1, 0 h(weight(4), 0.5, 3.5); // bin index 1, 1 // use the `indexed` range adaptor to iterate over all bins; // it is not only more convenient but also faster than a hand-crafted loop! std::ostringstream os; for (auto&& x : indexed(h)) { // x is a special accessor object const auto i = x.index(0); // current index along first axis const auto j = x.index(1); // current index along second axis const auto b0 = x.bin(0); // current bin interval along first axis const auto b1 = x.bin(1); // current bin interval along second axis const auto v = *x; // "dereference" to get the bin value os << boost::format("%i %i [%2i, %i) [%2i, %i): %i\n") % i % j % b0.lower() % b0.upper() % b1.lower() % b1.upper() % v; } std::cout << os.str() << std::flush; assert(os.str() == "0 0 [-1, 0) [ 2, 3): 1\n" "1 0 [ 0, 1) [ 2, 3): 3\n" "0 1 [-1, 0) [ 3, 4): 2\n" "1 1 [ 0, 1) [ 3, 4): 4\n"); // `indexed` skips underflow and overflow bins by default, but can be called // with the second argument `coverage::all` to walk over all bins std::ostringstream os2; for (auto&& x : indexed(h, coverage::all)) { os2 << boost::format("%2i %2i: %i\n") % x.index(0) % x.index(1) % *x; } std::cout << os2.str() << std::flush; assert(os2.str() == "-1 -1: 0\n" " 0 -1: 0\n" " 1 -1: 0\n" " 2 -1: 0\n" "-1 0: 0\n" " 0 0: 1\n" " 1 0: 3\n" " 2 0: 0\n" "-1 1: 0\n" " 0 1: 2\n" " 1 1: 4\n" " 2 1: 0\n" "-1 2: 0\n" " 0 2: 0\n" " 1 2: 0\n" " 2 2: 0\n"); }
Histograms from this library can do more than counting, they can hold arbitrary
accumulators which accept samples. We call a histogram with accumulators
that compute the mean of samples in each cell a profile.
Profiles can be generated with the factory function make_profile
.
#include <boost/format.hpp> #include <boost/histogram.hpp> #include <cassert> #include <iostream> #include <sstream> #include <tuple> int main() { using namespace boost::histogram; // make a profile, it computes the mean of the samples in each histogram cell auto h = make_profile(axis::integer<>(0, 3)); // mean is computed from the values marked with the sample() helper function h(0, sample(1)); // sample goes to cell 0 h(0, sample(2)); // sample goes to cell 0 h(1, sample(3)); // sample goes to cell 1 h(sample(4), 1); // sample goes to cell 1; sample can be first or last argument // fills from tuples are also supported, 5 and 6 go to cell 2 auto a = std::make_tuple(2, sample(5)); auto b = std::make_tuple(sample(6), 2); h(a); h(b); // builtin accumulators have methods to access their state std::ostringstream os; for (auto&& x : indexed(h)) { // use `.` to access methods of accessor, like `index()` // use `->` to access methods of accumulator const auto i = x.index(); const auto n = x->count(); // how many samples are in this bin const auto vl = x->value(); // mean value const auto vr = x->variance(); // estimated variance of the mean value os << boost::format("index %i count %i value %.1f variance %.1f\n") % i % n % vl % vr; } std::cout << os.str() << std::flush; assert(os.str() == "index 0 count 2 value 1.5 variance 0.5\n" "index 1 count 2 value 3.5 variance 0.5\n" "index 2 count 2 value 5.5 variance 0.5\n"); }
Just like weight()
,
sample()
is a
marker function. It must be the first or last argument.
Weights and samples may be combined, if the accumulators can handle weights.
When both weight()
and sample()
appear in the call
operator
or the fill
method
, they can be in any order with respect to other, but
they must be the first or last arguments. To make a profile which can compute
weighted means with proper uncertainty estimates, use the make_weighted_profile
factory function.
#include <boost/format.hpp> #include <boost/histogram.hpp> #include <cassert> #include <iostream> #include <sstream> int main() { using namespace boost::histogram; using namespace boost::histogram::literals; // _c suffix creates compile-time numbers // make 2D weighted profile auto h = make_weighted_profile(axis::integer<>(0, 2), axis::integer<>(0, 2)); // The mean is computed from the values marked with the sample() helper function. // Weights can be passed as well. The `sample` and `weight` arguments can appear in any // order, but they must be the first or last arguments. h(0, 0, sample(1)); // sample goes to cell (0, 0); weight is 1 h(0, 0, sample(2), weight(3)); // sample goes to cell (0, 0); weight is 3 h(1, 0, sample(3)); // sample goes to cell (1, 0); weight is 1 h(1, 0, sample(4)); // sample goes to cell (1, 0); weight is 1 h(0, 1, sample(5)); // sample goes to cell (1, 0); weight is 1 h(0, 1, sample(6)); // sample goes to cell (1, 0); weight is 1 h(1, 1, weight(4), sample(7)); // sample goes to cell (1, 1); weight is 4 h(weight(5), sample(8), 1, 1); // sample goes to cell (1, 1); weight is 5 std::ostringstream os; for (auto&& x : indexed(h)) { const auto i = x.index(0_c); const auto j = x.index(1_c); const auto m = x->value(); // weighted mean const auto v = x->variance(); // estimated variance of weighted mean os << boost::format("index %i,%i mean %.1f variance %.1f\n") % i % j % m % v; } std::cout << os.str() << std::flush; assert(os.str() == "index 0,0 mean 1.8 variance 0.5\n" "index 1,0 mean 3.5 variance 0.5\n" "index 0,1 mean 5.5 variance 0.5\n" "index 1,1 mean 7.6 variance 0.5\n"); }
The following operators are supported for pairs of histograms +, -, *,
/, ==, !=
. Histograms can also be multiplied and
divided by a scalar. Only a subset of the arithmetic operators is available
when the underlying accumulator only supports that subset.
The arithmetic operators can only be used when the histograms have the same axis configuration. This checked at run-time. An exception is thrown if the configurations do not match. Two histograms have the same axis configuration, if all axes compare equal, which includes a comparison of their metadata. Two histograms compare equal, when their axis configurations and all their cell values compare equal.
Note | |
---|---|
If the metadata type has |
#include <boost/histogram.hpp> #include <cassert> #include <vector> int main() { using namespace boost::histogram; // make two histograms auto h1 = make_histogram(axis::regular<>(2, -1.0, 1.0)); auto h2 = make_histogram(axis::regular<>(2, -1.0, 1.0)); h1(-0.5); // counts are: 1 0 h2(0.5); // counts are: 0 1 // add them auto h3 = h1; h3 += h2; // counts are: 1 1 // adding multiple histograms at once is likely to be optimized by the compiler so that // superfluous temporaries avoided, but no guarantees are given; use this equivalent // code when you want to make sure: h4 = h1; h4 += h2; h4 += h3; auto h4 = h1 + h2 + h3; // counts are: 2 2 assert(h4.at(0) == 2 && h4.at(1) == 2); // multiply by number, h4 *= 2 also works auto h5 = h4 * 2; // counts are: 4 4 // divide by number; s4 /= 4 also works auto h6 = h5 / 4; // counts are: 1 1 assert(h6.at(0) == 1 && h6.at(1) == 1); assert(h6 != h5 && h5 == 4 * h6); // note the special effect of multiplication on weight_storage auto h = make_histogram_with(weight_storage(), axis::regular<>(2, -1.0, 1.0)); h(-0.5); // counts are: 1 0 assert(h.at(0).value() == 1 && h.at(1).value() == 0); auto h_sum = h + h; auto h_mul = 2 * h; // values are the same as expected... assert(h_sum.at(0).value() == h_mul.at(0).value()); // ... but variances differ assert(h_sum.at(0).variance() == 2 && h_mul.at(0).variance() == 4); // equality operator checks variances, so histograms are not equal assert(h_sum != h_mul); }
Note | |
---|---|
A histogram with default storage converts its cell values to double when they are to be multiplied with or divided by a real number, or when a real number is added or subtracted. At this point the no-overflow-guarantee is lost. |
Note | |
---|---|
When the storage tracks weight variances, such as |
The library was designed to work with algorithms from the C++ standard library. In addition, a support library of algorithms is included with common operations on histograms.
The histogram class has standard random-access iterators which can be used with various algorithms from the standard library. Some things need to be kept in mind:
boost::histogram::indexed
range generator instead.
#include <boost/histogram.hpp> #include <cassert> #include <algorithm> // fill, any_of, min_element, max_element #include <cmath> // sqrt #include <numeric> // partial_sum, inner_product int main() { using namespace boost::histogram; // make histogram that represents a probability density function (PDF) auto h1 = make_histogram(axis::regular<>(4, 1.0, 3.0)); // make indexed range to skip underflow and overflow cells auto ind = indexed(h1); // use std::fill to set all counters to 0.25 (except under- and overflow counters) std::fill(ind.begin(), ind.end(), 0.25); // compute the cumulative density function (CDF), overriding cell values std::partial_sum(ind.begin(), ind.end(), ind.begin()); assert(h1.at(-1) == 0.00); assert(h1.at(0) == 0.25); assert(h1.at(1) == 0.50); assert(h1.at(2) == 0.75); assert(h1.at(3) == 1.00); assert(h1.at(4) == 0.00); // use any_of to check if any cell values are smaller than 0.1, auto b = std::any_of(ind.begin(), ind.end(), [](const auto& x) { return x < 0.1; }); assert(b == false); // under- and overflow cells are zero, but skipped // find minimum element auto min_it = std::min_element(ind.begin(), ind.end()); assert(*min_it == 0.25); // under- and overflow cells are skipped // find maximum element auto max_it = std::max_element(ind.begin(), ind.end()); assert(*max_it == 1.0); // make second PDF auto h2 = make_histogram(axis::regular<>(4, 1.0, 4.0)); h2.at(0) = 0.1; h2.at(1) = 0.3; h2.at(2) = 0.2; h2.at(3) = 0.4; // computing cosine similiarity: cos(theta) = A dot B / sqrt((A dot A) * (B dot B)) auto ind2 = indexed(h2); const auto aa = std::inner_product(ind.begin(), ind.end(), ind.begin(), 0.0); const auto bb = std::inner_product(ind2.begin(), ind2.end(), ind2.begin(), 0.0); const auto ab = std::inner_product(ind.begin(), ind.end(), ind2.begin(), 0.0); const auto cos_sim = ab / std::sqrt(aa * bb); assert(std::abs(cos_sim - 0.967) < 1e-2); }
It is easy to iterate over all histogram cells to compute the sum of cell
values by hand or to use an algorithm from the standard library to do so,
but it is not safe. The result may not be accurate or overflow, if the
sum is represented by an integer type. The library provides boost::histogram::algorithm::sum
as a safer, albeit slower, alternative. It uses the Neumaier
algorithm in double precision for integers and floating point cells,
and does the naive sum otherwise.
It is sometimes convenient to generate a high-dimensional histogram first and then extract smaller or lower-dimensional versions from it. Lower-dimensional histograms are obtained by summing the bin contents of the removed axes. This is called a projection. If the histogram has under- and overflow bins along all axes, this operation creates a histogram which is identical to one that would have been obtained by filling the original data.
Projection is useful if you found out that there is no interesting structure along an axis, so it is not worth keeping that axis around, or if you want to visualize 1d or 2d summaries of a high-dimensional histogram.
The library provides the boost::histogram::algorithm::project
function for this purpose. It accepts the original histogram and the indices
(one or more) of the axes that are kept and returns the lower-dimensional
histogram. If the histogram is static, meaning the axis configuration is
known at compile-time, compile-time numbers should be used as indices to
obtain another static histogram. Using normal numbers or iterators over
indices produces a histogram with a dynamic axis configuration.
#include <boost/histogram.hpp> #include <cassert> #include <iostream> #include <sstream> int main() { using namespace boost::histogram; using namespace literals; // enables _c suffix // make a 2d histogram auto h = make_histogram(axis::regular<>(3, -1.0, 1.0), axis::integer<>(0, 2)); h(-0.9, 0); h(0.9, 1); h(0.1, 0); auto hr0 = algorithm::project(h, 0_c); // keep only first axis auto hr1 = algorithm::project(h, 1_c); // keep only second axis // reduce does not remove counts; returned histograms are summed over // the removed axes, so h, hr0, and hr1 have same number of total counts; // we compute the sum of counts with the sum algorithm assert(algorithm::sum(h) == 3 && algorithm::sum(hr0) == 3 && algorithm::sum(hr1) == 3); std::ostringstream os1; for (auto&& x : indexed(h)) os1 << "(" << x.index(0) << ", " << x.index(1) << "): " << *x << "\n"; std::cout << os1.str() << std::flush; assert(os1.str() == "(0, 0): 1\n" "(1, 0): 1\n" "(2, 0): 0\n" "(0, 1): 0\n" "(1, 1): 0\n" "(2, 1): 1\n"); std::ostringstream os2; for (auto&& x : indexed(hr0)) os2 << "(" << x.index(0) << ", -): " << *x << "\n"; std::cout << os2.str() << std::flush; assert(os2.str() == "(0, -): 1\n" "(1, -): 1\n" "(2, -): 1\n"); std::ostringstream os3; for (auto&& x : indexed(hr1)) os3 << "(- ," << x.index(0) << "): " << *x << "\n"; std::cout << os3.str() << std::flush; assert(os3.str() == "(- ,0): 2\n" "(- ,1): 1\n"); }
A projection removes an axis completely. A less drastic way to obtain a
smaller histogram is the reduce
function, which allows one to slice, shrink
or rebin individual axes.
Shrinking means that the value range of an axis is reduced and the number of bins along that axis. Slicing does the same, but is based on axis indices while shrinking is based on the axis values. To rebin means that adjacent bins are merged into larger bins, the histogram is made coarser. For N adjacent bins, a new bin is formed which covers the common interval of the merged bins and has their added content. These two operations can be combined and applied to several axes at once. Doing it in one step is much more efficient than doing it in several steps.
The reduce
function does not change the total count if all modified axes in the histogram
have underflow and overflow bins. Counts in removed bins are added to the
corresponding under- and overflow bins. As in case of the project
function,
such a histogram is guaranteed to be identical to one obtained from filling
the original data.
#include <boost/histogram.hpp> #include <cassert> int main() { using namespace boost::histogram; // import reduce commands into local namespace to save typing using algorithm::rebin; using algorithm::shrink; using algorithm::slice; // make a 2d histogram auto h = make_histogram(axis::regular<>(4, 0.0, 4.0), axis::regular<>(4, -2.0, 2.0)); h(0, -0.9); h(1, 0.9); h(2, 0.1); h(3, 0.1); // reduce takes positional commands which are applied to the axes in order // - shrink is applied to the first axis; the new axis range is 0.0 to 3.0 // - rebin is applied to the second axis; pairs of adjacent bins are merged auto h2 = algorithm::reduce(h, shrink(0.0, 3.0), rebin(2)); assert(h2.axis(0) == axis::regular<>(3, 0.0, 3.0)); assert(h2.axis(1) == axis::regular<>(2, -2.0, 2.0)); // reduce does not change the total count if the histogram has underflow/overflow bins assert(algorithm::sum(h) == 4 && algorithm::sum(h2) == 4); // One can also explicitly specify the index of the axis in the histogram on which the // command should act, by using this index as the the first parameter. The position of // the command in the argument list of reduce is then ignored. We use this to slice only // the second axis (axis has index 1 in the histogram) from bin index 2 to 4. auto h3 = algorithm::reduce(h, slice(1, 2, 4)); assert(h3.axis(0) == h.axis(0)); // unchanged assert(h3.axis(1) == axis::regular<>(2, 0.0, 2.0)); }
Simple streaming operators are shipped with the library. They are internally used by the unit tests and give simple text representations of axis and histogram configurations and show the histogram content. One-dimensional histograms are rendered as ASCII drawings. The text representations may be useful for debugging or more, but users may want to use their own implementations. Therefore, the headers with the builtin implementations are not included by any other header of the library. The following example shows the effect of output streaming.
#include <boost/histogram.hpp> #include <boost/histogram/ostream.hpp> #include <cassert> #include <iostream> #include <sstream> #include <string> int main() { using namespace boost::histogram; std::ostringstream os; // width of histogram can be set like this; if it is not set, the library attempts to // determine the terminal width and choses the histogram width accordingly os.width(78); auto h1 = make_histogram(axis::regular<>(5, -1.0, 1.0, "axis 1")); h1.at(0) = 2; h1.at(1) = 4; h1.at(2) = 3; h1.at(4) = 1; // 1D histograms are rendered as an ASCII drawing os << h1; auto h2 = make_histogram(axis::regular<>(2, -1.0, 1.0, "axis 1"), axis::category<std::string>({"red", "blue"}, "axis 2")); // higher dimensional histograms just have their cell counts listed os << h2; std::cout << os.str() << std::endl; assert( os.str() == "histogram(regular(5, -1, 1, metadata=\"axis 1\", options=underflow | overflow))\n" " ┌─────────────────────────────────────────────────────────────┐\n" "[-inf, -1) 0 │ │\n" "[ -1, -0.6) 2 │██████████████████████████████ │\n" "[-0.6, -0.2) 4 │████████████████████████████████████████████████████████████ │\n" "[-0.2, 0.2) 3 │█████████████████████████████████████████████ │\n" "[ 0.2, 0.6) 0 │ │\n" "[ 0.6, 1) 1 │███████████████ │\n" "[ 1, inf) 0 │ │\n" " └─────────────────────────────────────────────────────────────┘\n" "histogram(\n" " regular(2, -1, 1, metadata=\"axis 1\", options=underflow | overflow)\n" " category(\"red\", \"blue\", metadata=\"axis 2\", options=overflow)\n" " (-1 0): 0 ( 0 0): 0 ( 1 0): 0 ( 2 0): 0 (-1 1): 0 ( 0 1): 0\n" " ( 1 1): 0 ( 2 1): 0 (-1 2): 0 ( 0 2): 0 ( 1 2): 0 ( 2 2): 0\n" ")"); }
The library supports serialization via Boost.Serialization.
The serialization code is not included by the super header #include <boost/histogram.hpp>
,
so that the library can be used without Boost.Serialization or with another
compatible serialization library.
#include <boost/archive/text_iarchive.hpp> #include <boost/archive/text_oarchive.hpp> #include <boost/histogram.hpp> #include <boost/histogram/serialization.hpp> // includes serialization code #include <cassert> #include <sstream> int main() { using namespace boost::histogram; auto a = make_histogram(axis::regular<>(3, -1.0, 1.0, "axis 0"), axis::integer<>(0, 2, "axis 1")); a(0.5, 1); std::string buf; // to hold persistent representation // store histogram { std::ostringstream os; boost::archive::text_oarchive oa(os); oa << a; buf = os.str(); } auto b = decltype(a)(); // create a default-constructed second histogram assert(b != a); // b is empty, a is not // load histogram { std::istringstream is(buf); boost::archive::text_iarchive ia(is); ia >> b; } assert(b == a); // now b is equal to a }
Letting the compiler deduce the histogram type is recommended, because the templated type is tedious to write down explicitly. Functions or methods which accept or return histograms should be templated to work with all kinds of histograms. It is also possible to write templated versions which accept only histograms with dynamic axes or only histograms with static axes. The following example demonstrates all this.
#include <boost/histogram.hpp> #include <cassert> // function accepts any histogram and returns a copy template <class Axes, class Storage> boost::histogram::histogram<Axes, Storage> any_histogram( boost::histogram::histogram<Axes, Storage>& h) { return h; } // function only accepts histograms with fixed axis types and returns a copy template <class Storage, class... Axes> boost::histogram::histogram<std::tuple<Axes...>, Storage> only_static_histogram( boost::histogram::histogram<std::tuple<Axes...>, Storage>& h) { return h; } // function only accepts histograms with dynamic axis types and returns a copy template <class Storage, class... Axes> boost::histogram::histogram<std::vector<boost::histogram::axis::variant<Axes...>>, Storage> only_dynamic_histogram( boost::histogram::histogram<std::vector<boost::histogram::axis::variant<Axes...>>, Storage>& h) { return h; } int main() { using namespace boost::histogram; auto histogram_with_static_axes = make_histogram(axis::regular<>(10, 0, 1)); using axis_variant = axis::variant<axis::regular<>, axis::integer<>>; std::vector<axis_variant> axes; axes.emplace_back(axis::regular<>(5, 0, 1)); axes.emplace_back(axis::integer<>(0, 1)); auto histogram_with_dynamic_axes = make_histogram(axes); assert(any_histogram(histogram_with_static_axes) == histogram_with_static_axes); assert(any_histogram(histogram_with_dynamic_axes) == histogram_with_dynamic_axes); assert(only_static_histogram(histogram_with_static_axes) == histogram_with_static_axes); assert(only_dynamic_histogram(histogram_with_dynamic_axes) == histogram_with_dynamic_axes); // does not compile: only_static_histogram(histogram_with_dynamic_axes) // does not compile: only_dynamic_histogram(histogram_with_static_axes) }
If the histogram type has to be written down explicitly, the types are constructed
as follows. In all cases, the default_storage
type argument may be replaced by any other storage type or omitted entirely.
Histogram with fixed axis types:
boost::histogram::histogram< std::tuple<axis_type_1, axis_type_2, ..., axis_type_N> , boost::histogram::default_storage >
Histogram with a variable number of the same axis type:
boost::histogram::histogram< std::vector< axis_type_1 > , boost::histogram::default_storage >
Histogram with variable axis types:
boost::histogram::histogram< std::vector< boost::histogram::axis::variant< axis_type_1, axis_type_2, ..., axis_type_N > > , boost::histogram::default_storage >
The library is customisable and extensible by users. Users can create new axis types and use them with the histogram, or implement a custom storage policy, or use a builtin storage policy with a custom counter type. The library was designed to make this very easy. This section shows how to do this.
It is easy to make custom axis classes that work with the library. The custom axis class must meet the requirements of the Axis concept.
Users can create a custom axis by derive from a builtin axis type and customize
its behavior. The following examples demonstrates a modification of the
integer axis
.
#include <boost/histogram.hpp> #include <cassert> #include <iostream> #include <sstream> int main() { using namespace boost::histogram; // custom axis, which adapts builtin integer axis struct custom_axis : public axis::integer<> { using value_type = const char*; // type that is fed to the axis using integer::integer; // inherit ctors of base // the customization point // - accept const char* and convert to int // - then call index method of base class axis::index_type index(value_type s) const { return integer::index(std::atoi(s)); } }; auto h = make_histogram(custom_axis(3, 6)); h("-10"); h("3"); h("4"); h("9"); std::ostringstream os; for (auto&& b : indexed(h)) { os << "bin " << b.index() << " [" << b.bin() << "] " << *b << "\n"; } std::cout << os.str() << std::endl; assert(os.str() == "bin 0 [3] 1\n" "bin 1 [4] 1\n" "bin 2 [5] 0\n"); }
How to make an axis completely from scratch is shown in the next example.
#include <boost/histogram.hpp> #include <cassert> int main() { using namespace boost::histogram; // stateless axis which returns 1 if the input is even and 0 otherwise struct even_odd_axis { axis::index_type index(int x) const { return x % 2; } axis::index_type size() const { return 2; } }; // threshold axis which returns 1 if the input is above threshold struct threshold_axis { threshold_axis(double x) : thr(x) {} axis::index_type index(double x) const { return x >= thr; } axis::index_type size() const { return 2; } double thr; }; auto h = make_histogram(even_odd_axis(), threshold_axis(3.0)); h(0, 2.0); h(1, 4.0); h(2, 4.0); assert(h.at(0, 0) == 1); // even, below threshold assert(h.at(0, 1) == 1); // even, above threshold assert(h.at(1, 0) == 0); // odd, below threshold assert(h.at(1, 1) == 1); // odd, above threshold }
Such minimal axis types lack many features provided by the builtin axis types, for example, one cannot iterate over them, but this functionality can be added as needed.
Multi-dimensional histograms usually have an orthogonal system of axes. Orthogonal means that each axis takes care of only one value and computes its local index independently of all the other axes and values. A checker-board is such an orthogonal grid in 2D.
There are other interesting grids which are not orthogonal, notably the honeycomb grid. In such a grid, each cell is hexagonal and even though the cells form a perfectly regular pattern, it is not possible to sort values into these cells using two orthogonal axes.
The library supports non-orthogonal grids by allowing axis types to accept
a std::tuple
of values. The axis can compute
an index from the values passed to it in an arbitrary way. The following
example demonstrates this.
#include <boost/histogram.hpp> #include <cassert> int main() { using namespace boost::histogram; // axis which returns 1 if the input falls inside the unit circle and zero otherwise struct circle_axis { // accepts a 2D point in form of a std::tuple axis::index_type index(const std::tuple<double, double>& point) const { const auto x = std::get<0>(point); const auto y = std::get<1>(point); return x * x + y * y <= 1.0; } axis::index_type size() const { return 2; } }; auto h1 = make_histogram(circle_axis()); // fill looks normal for a histogram which has only one Nd-axis h1(0, 0); // in h1(0, -1); // in h1(0, 1); // in h1(-1, 0); // in h1(1, 0); // in h1(1, 1); // out h1(-1, -1); // out // 2D histogram, but only 1D index assert(h1.at(0) == 2); // out assert(h1.at(1) == 5); // in // other axes can be combined with a Nd-axis auto h2 = make_histogram(circle_axis(), axis::category<std::string>({"red", "blue"})); // now we need to pass arguments for Nd-axis explicitly as std::tuple h2(std::make_tuple(0, 0), "red"); h2(std::make_tuple(1, 1), "blue"); // 3D histogram, but only 2D index assert(h2.at(0, 0) == 0); // out, red assert(h2.at(0, 1) == 1); // out, blue assert(h2.at(1, 0) == 1); // in, red assert(h2.at(1, 1) == 0); // in, blue }
Histograms which use a different storage class can easily created with
the factory function make_histogram_with
.
For convenience, this factory function accepts many standard containers
as storage backends: vectors, arrays, and maps. These are automatically
wrapped with a boost::histogram::storage_adaptor
to provide the storage interface needed by the library. Users may also
place custom accumulators in the vector, as described in the next section.
Warning | |
---|---|
The no-overflow-guarantee is only valid if the |
A std::vector
may provide higher performance
than the unlimited_storage
with a carefully chosen counter type. Usually, this would be an integral
or floating point type. A std::vector
-based
storage may be faster for low-dimensional histograms (or not, you need
to measure).
Users who work exclusively with weighted histograms should chose a std::vector<double>
,
it will be faster. If they also want to track the variance of the sum of
weights, a vector-based storage of weighted_sum
accumulators should be used. The factory function make_weighted_histogram
is a convenient way to generate a histogram with this storage.
An interesting alternative to a std::vector
is to use a std::array
. The latter provides a storage
with a fixed maximum capacity (the size of the array). std::array
allocates the memory on the stack. In combination with a static axis configuration
this allows one to create histograms completely on the stack without any
dynamic memory allocation. Small stack-based histograms can be created
and destroyed very fast.
Finally, a std::map
or std::unordered_map
or any other map type that implements the STL interface can be used to
generate a histogram with a sparse storage, where empty cells do not consume
any memory. This sounds attractive, but the memory consumption per cell
in such a data structure is much larger than for a vector or array, so
the number of empty cells must be substantial to gain. Moreover, cell lookup
in a sparse data structure may be less performant. Whether a sparse storage
performs better than a dense storage depends on the use case. The library
makes it easy to switch from dense to sparse storage and back, so users
are invited to test both options.
The following example shows how histograms are constructed which use an alternative storage classes.
#include <algorithm> // std::for_each #include <array> #include <boost/histogram.hpp> #include <boost/histogram/algorithm/sum.hpp> #include <functional> // std::ref #include <unordered_map> #include <vector> int main() { using namespace boost::histogram; const auto axis = axis::regular<>(10, 0.0, 1.0); auto data = {0.1, 0.3, 0.2, 0.7}; // Create static histogram with vector<int> as counter storage, you can use // other arithmetic types as counters, e.g. double. auto h1 = make_histogram_with(std::vector<int>(), axis); std::for_each(data.begin(), data.end(), std::ref(h1)); assert(algorithm::sum(h1) == 4); // Create static histogram with array<int, N> as counter storage which is // allocated completely on the stack (this is very fast). N may be larger than // the actual number of bins used; an exception is raised if N is too small to // hold all bins. auto h2 = make_histogram_with(std::array<int, 12>(), axis); std::for_each(data.begin(), data.end(), std::ref(h2)); assert(algorithm::sum(h2) == 4); // Create static histogram with unordered_map as counter storage; this // generates a sparse histogram where only memory is allocated for bins that // are non-zero. This sounds like a good idea for high-dimensional histograms, // but maps come with a memory and run-time overhead. The default_storage // usually performs better in high dimensions. auto h3 = make_histogram_with(std::unordered_map<std::size_t, int>(), axis); std::for_each(data.begin(), data.end(), std::ref(h3)); assert(algorithm::sum(h3) == 4); }
There are two ways to generate a single histogram using several threads.
1. Each thread has its own copy of the histogram. Each copy is independently
filled. The copies are then added in the main thread. Use this as the default
when you can afford having N
copies of the histogram in memory for N
threads, because it allows each thread to work on its thread-local memory
and utilise the CPU cache without the need to synchronise memory access.
The highest performance gains are obtained in this way.
2. There is only one histogram which is filled concurrently by several
threads. This requires using a thread-safe storage that can handle concurrent
writes. The library provides the boost::histogram::accumulators::count
accumulator with a thread-safe option, which combined with the boost::histogram::dense_storage
provides a thread-safe storage.
Note | |
---|---|
Filling a histogram with growing axes in a multi-threaded environment is safe, but has poor performance since the histogram must be locked on each fill. The locks are required because an axis could grow each time, which changes the number of cells and cell addressing for all other threads. Even without growing axes, there is only a performance gain if the histogram is either very large or when significant time is spend in preparing the value to fill. For small histograms, threads frequently access the same cell, whose state has to be synchronised between the threads. This is slow even with atomic counters and made worse by the effect of false sharing. |
The next example demonstrates option 2 (option 1 is straight-forward to implement).
#include <boost/histogram.hpp> #include <boost/histogram/algorithm/sum.hpp> #include <cassert> #include <functional> #include <thread> #include <vector> // dummy fill function, to be executed in parallel by several threads template <typename Histogram> void fill(Histogram& h) { for (unsigned i = 0; i < 1000; ++i) { h(i % 10); } } int main() { using namespace boost::histogram; /* Create histogram with container of thread-safe counters for parallel filling in several threads. Only filling is thread-safe, other guarantees are not given. */ auto h = make_histogram_with(dense_storage<accumulators::count<unsigned, true>>(), axis::integer<>(0, 10)); /* Run the fill function in parallel from different threads. This is safe when a thread-safe accumulator and a storage with thread-safe cell access are used. */ auto fill_h = [&h]() { fill(h); }; std::thread t1(fill_h); std::thread t2(fill_h); std::thread t3(fill_h); std::thread t4(fill_h); t1.join(); t2.join(); t3.join(); t4.join(); // Without a thread-safe accumulator, this number may be smaller. assert(algorithm::sum(h) == 4000); }
A storage can hold custom accumulators which can accept an arbitrary number
of arguments. The arguments are passed to the accumulator via the sample
call, for example,
sample(1, 2, 3)
for an accumulator which accepts three arguments. Custom accumulators can
be combined with any container supported by boost::histogram::storage_adaptor
.
For convenience, the alias template boost::histogram::dense_storage
is provided to make a standard storage with a custom accumulator type.
The library provides several accumulators:
count
accepts no samples, but accepts a weight. It is an alternative to a
plain arithmetic type as a counter. It has two benefits. Firstly, users
can derive from count to attach arbitrary metadata to each bin. Secondly,
count has a template option to make increments atomic and thus thread-safe,
which then allows one to fill a histogram concurrently without locking,
if the storage also supports this (the dense_storage
does, for example).
sum
accepts no samples, but accepts a weight. It is an alternative to a
plain arithmetic type as a counter. It provides an advantage when histograms
are filled with weights that differ dramatically in magnitude. The
sum of weights is computed incrementally with the Neumaier algorithm.
The algorithm is more accurate, but consumes more CPU and memory (memory
is doubled compared to a normal sum of floating point numbers).
weighted_sum
accepts no samples, but accepts a weight. It computes the sum of weights
and the sum of weights squared, the variance estimate of the sum of
weights. This type is used by the make_weighted_histogram
.
mean
accepts a sample and computes the mean of the samples. make_profile
uses
this accumulator.
weighted_mean
accepts a sample and a weight. It computes the weighted mean of the
samples. make_weighted_profile
uses this accumulator.
fraction
accepts a boolean sample that represents success or failure of a binomial
trial. It computes the fraction of successes. One can access the number
of successes and failures, the fraction, the estimated variance of
the fraction, and a confidence interval. The standard confidence interval
is the Wilson score interval, but more interval computers are implemented
in boost/histogram/utility
. Beware: one cannot pass
std::vector<bool>
to histogram::fill
,
because it is not a contiguous sequence of boolean values, but any
other container of booleans works and any sequence of values convertible
to bool.
Users can easily write their own accumulators and plug them into the histogram, if they adhere to the Accumulator concept. All accumulators from Boost.Accumulators that accept a single argument and no weights work out of the box. Other accumulators from Boost.Accumulators can be made to work by using them inside a wrapper class that implements the concept.
The first example shows how to make and use a histogram that uses one of the the builtin accumulators.
#include <boost/format.hpp> #include <boost/histogram.hpp> #include <cassert> #include <iostream> #include <sstream> int main() { using namespace boost::histogram; using mean = accumulators::mean<>; // Create a 1D-profile, which computes the mean of samples in each bin. auto h = make_histogram_with(dense_storage<mean>(), axis::integer<>(0, 2)); // The factory function `make_profile` is provided as a shorthand for this, so this is // equivalent to the previous line: auto h = make_profile(axis::integer<>(0, 2)); // An argument marked as `sample` is passed to the accumulator. h(0, sample(1)); // sample goes to first cell h(0, sample(2)); // sample goes to first cell h(1, sample(3)); // sample goes to second cell h(1, sample(4)); // sample goes to second cell std::ostringstream os; for (auto&& x : indexed(h)) { // Accumulators usually have methods to access their state. Use the arrow // operator to access them. Here, `count()` gives the number of samples, // `value()` the mean, and `variance()` the variance estimate of the mean. os << boost::format("index %i count %i mean %.1f variance %.1f\n") % x.index() % x->count() % x->value() % x->variance(); } std::cout << os.str() << std::flush; assert(os.str() == "index 0 count 2 mean 1.5 variance 0.5\n" "index 1 count 2 mean 3.5 variance 0.5\n"); }
The simplest way to make a custom accumulator is to inherit from one of the builtin accumulators. The following example shows how to add arbitrary metadata to each histogram cell by inheriting a custom accumulator from a builtin accumulator.
#include <boost/histogram.hpp> #include <cmath> #include <iostream> #include <sstream> #include <string> int main() { using namespace boost::histogram; // derive custom accumulator from one of the builtins struct accumulator_with_metadata : accumulators::count<> { std::string meta; // custom meta data // arbitrary additional data and interface could be added here }; // make 1D histogram with custom accmulator auto h = make_histogram_with(dense_storage<accumulator_with_metadata>(), axis::integer<>(1, 4)); // fill some weighted entries auto x = {1, 0, 2, 1}; h.fill(x); // assigning meta data to two bins h[0].meta = "Foo"; h[2].meta = "Bar"; std::ostringstream os; for (auto&& x : indexed(h)) os << x.bin() << " value " << x->value() << " meta " << x->meta << "\n"; std::cout << os.str() << std::flush; assert(os.str() == "1 value 2 meta Foo\n" "2 value 1 meta \n" "3 value 0 meta Bar\n"); }
The next example shows how to making a custom accumulators completely from scratch. The library was designed to make this as easy as possible.
#include <boost/format.hpp> #include <boost/histogram.hpp> #include <cassert> #include <iostream> #include <sstream> int main() { using namespace boost::histogram; // A custom accumulator which tracks the maximum of the samples. // It must have a call operator that accepts the argument of the `sample` function. struct maximum { // return value is ignored, so we use void void operator()(double x) { if (x > value) value = x; } double value = 0; // value is public and initialized to zero }; // Create 1D histogram that uses the custom accumulator. auto h = make_histogram_with(dense_storage<maximum>(), axis::integer<>(0, 2)); h(0, sample(1.0)); // sample goes to first cell h(0, sample(2.0)); // sample goes to first cell h(1, sample(3.0)); // sample goes to second cell h(1, sample(4.0)); // sample goes to second cell std::ostringstream os; for (auto&& x : indexed(h)) { os << boost::format("index %i maximum %.1f\n") % x.index() % x->value; } std::cout << os.str() << std::flush; assert(os.str() == "index 0 maximum 2.0\n" "index 1 maximum 4.0\n"); }
The next example shows a more complex custom accumulator that accepts two samples at once and an optional weight. It independently computes the mean for each sample. This is more efficient than filling two separate profiles, because the cell lookup has to be done only once.
#include <boost/format.hpp> #include <boost/histogram.hpp> #include <iostream> #include <sstream> int main() { using namespace boost::histogram; // Accumulator accepts two samples and an optional weight and computes the mean of each. struct multi_mean { accumulators::mean<> mx, my; // called when no weight is passed void operator()(double x, double y) { mx(x); my(y); } // called when a weight is passed void operator()(weight_type<double> w, double x, double y) { mx(w, x); my(w, y); } }; // Note: The implementation can be made more efficient by sharing the sum of weights. // Create a 1D histogram that uses the custom accumulator. auto h = make_histogram_with(dense_storage<multi_mean>(), axis::integer<>(0, 2)); h(0, sample(1, 2)); // samples go to first cell h(0, sample(3, 4)); // samples go to first cell h(1, sample(5, 6), weight(2)); // samples go to second cell h(1, sample(7, 8), weight(3)); // samples go to second cell std::ostringstream os; for (auto&& bin : indexed(h)) { os << boost::format("index %i mean-x %.1f mean-y %.1f\n") % bin.index() % bin->mx.value() % bin->my.value(); } std::cout << os.str() << std::flush; assert(os.str() == "index 0 mean-x 2.0 mean-y 3.0\n" "index 1 mean-x 6.2 mean-y 7.2\n"); }
And finally, just for fun, we use a histogram as the accumulator for another histogram.
#include <boost/histogram.hpp> #include <cmath> #include <iostream> #include <sstream> #include <string> int main() { using namespace boost::histogram; // First we define the nested histogram type. using axis_t = axis::category<int, axis::null_type, axis::option::growth_t>; using base_t = histogram<std::tuple<axis_t>>; // Now we make an accumulator out of it by using inheritance. // We only need to implement operator(). A matching version of operator() is actually // present in base_t, but it is templated and this is not allowed by the accumulator // concept. Initialization could also happen here. We don't need to initialize anything // here, because the default constructor of base_t is called automatically and // sufficient for this example. struct hist_t : base_t { void operator()(const double x) { base_t::operator()(x); } }; auto h = make_histogram_with(dense_storage<hist_t>(), axis::integer<>(1, 4)); auto x = {1, 1, 2, 2}; auto s = {1, 2, 3, 3}; // samples are filled into the nested histograms h.fill(x, sample(s)); std::ostringstream os; for (auto&& x : indexed(h)) { os << x.bin() << " "; for (auto&& y : indexed(*x)) { os << "(" << y.bin() << ": " << *y << ") "; } os << "\n"; } std::cout << os.str() << std::flush; assert(os.str() == "1 (1: 1) (2: 1) \n" "2 (3: 2) \n" "3 \n"); }
Note that the axis size of the nested histogram differs from bin to bin. Creating a 2D histogram in this way is not as efficient as the normal way, but it allows one to create a histograms with such a non-rectangular layout of cells.