Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. — Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Bivariate Statistics
PrevUpHomeNext

Synopsis

#include <boost/math/statistics/bivariate_statistics.hpp>

namespace boost{ namespace math{ namespace statistics {

    template<typename ExecutionPolicy, typename Container>
    auto covariance(ExecutionPolicy&& exec, Container const & u, Container const & v);

    template<typename Container>
    auto covariance(Container const & u, Container const & v);

    template<typename ExecutionPolicy, typename Container>
    auto means_and_covariance(ExecutionPolicy&& exec, Container const & u, Container const & v);

    template<typename Container>
    auto means_and_covariance(Container const & u, Container const & v);

    template<typename ExecutionPolicy, typename Container>
    auto correlation_coefficient(ExecutionPolicy&& exec, Container const & u, Container const & v);

    template<typename Container>
    auto correlation_coefficient(Container const & u, Container const & v);

}}}

Description

This file provides functions for computing bivariate statistics. The functions are C++11 compatible, but require C++17 to use execution policies. If an execution policy is not passed to the function the default is std::execution::seq.

Covariance

Computes the population covariance of two datasets:

std::vector<double> u{1,2,3,4,5};
std::vector<double> v{1,2,3,4,5};
double cov_uv = boost::math::statistics::covariance(u, v);

The implementation follows Bennet et al. The parallel implementation follows Schubert et al. The data is not modified. Works with real-valued inputs and does not work with complex-valued inputs.

Nota bene: If the input is an integer type the output will be a double precision type.

The algorithm used herein simultaneously generates the mean values of the input data u and v. For certain applications, it might be useful to get them in a single pass through the data. As such, we provide means_and_covariance:

std::vector<double> u{1,2,3,4,5};
std::vector<double> v{1,2,3,4,5};
auto [mu_u, mu_v, cov_uv] = boost::math::statistics::means_and_covariance(u, v);

Correlation Coefficient

Computes the Pearson correlation coefficient of two datasets u and v:

std::vector<double> u{1,2,3,4,5};
std::vector<double> v{1,2,3,4,5};
double rho_uv = boost::math::statistics::correlation_coefficient(u, v);
// rho_uv = 1.

Works with real-valued inputs and does not work with complex-valued inputs.

Nota bene: If the input is an integer type the output will be a double precision type.

If one or both of the datasets is constant, the correlation coefficient is an indeterminant form (0/0) and definitions must be introduced to assign it a value. We use the following: If both datasets are constant, then the correlation coefficient is 1. If one dataset is constant, and the other is not, then the correlation coefficient is zero.

References

  • Bennett, Janine, et al. Numerically stable, single-pass, parallel statistics algorithms. Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on. IEEE, 2009.
  • Schubert, Erich; Gertz, Michael Numerically stable parallel computation of (co-)variance' Proceedings of the 30th International Conference on Scientific and Statistical Database Management, 2018.

PrevUpHomeNext