...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
#include <boost/math/statistics/signal_statistics.hpp> namespace boost::math::statistics { template<class Container> auto absolute_gini_coefficient(Container & c); template<class ForwardIterator> auto absolute_gini_coefficient(ForwardIterator first, ForwardIterator last); template<class Container> auto sample_absolute_gini_coefficient(Container & c); template<class ForwardIterator> auto sample_absolute_gini_coefficient(ForwardIterator first, ForwardIterator last); template<class Container> auto hoyer_sparsity(Container const & c); template<class ForwardIterator> auto hoyer_sparsity(ForwardIterator first, ForwardIterator last); template<class Container> auto oracle_snr(Container const & signal, Container const & noisy_signal); template<class Container> auto oracle_snr_db(Container const & signal, Container const & noisy_signal); template<class ForwardIterator> auto m2m4_snr_estimator(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3); template<class Container> auto m2m4_snr_estimator(Container const & noisy_signal, typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3); template<class ForwardIterator> auto m2m4_snr_estimator_db(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3); template<class Container> auto m2m4_snr_estimator_db(Container const & noisy_signal,typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3); }
The file boost/math/statistics/signal_statistics.hpp
is a
set of facilities for computing quantities commonly used in signal analysis.
Our examples use std::vector<double>
to
hold the data, but this not required. In general, you can store your data in
an Eigen array, and Armadillo vector, std::array
,
and for many of the routines, a std::forward_list
.
These routines are usable in float, double, long double, and Boost.Multiprecision
precision, as well as their complex extensions whenever the computation is
well-defined.
The Gini coefficient, first used to measure wealth inequality, is also one
of the best measures of the sparsity of an expansion in a basis. A sparse expansion
has most of its norm concentrated in just a few coefficients, making the connection
with wealth inequality obvious. See Hurley
and Rickard for details. However, for measuring sparsity, the phase
of the numbers is irrelevant, so we provide the absolute_gini_coefficient
:
using boost::math::statistics::sample_absolute_gini_coefficient; using boost::math::statistics::absolute_gini_coefficient; std::vector<std::complex<double>> v{{0,1}, {0,0}, {0,0}, {0,0}}; double abs_gini = sample_absolute_gini_coefficient(v); // now abs_gini = 1; maximally unequal std::vector<std::complex<double>> w{{0,1}, {1,0}, {0,-1}, {-1,0}}; abs_gini = absolute_gini_coefficient(w); // now abs_gini = 0; every element of the vector has equal magnitude std::vector<double> u{-1, 1, -1}; abs_gini = absolute_gini_coefficient(u); // now abs_gini = 0 // Alternative call useful for computing over subset of the input: abs_gini = absolute_gini_coefficient(u.begin(), u.begin() + 1);
The sample Gini coefficient returns unity for a vector which has only one nonzero coefficient. The population Gini coefficient of a vector with one non-zero element is dependent on the length of the input.
The sample Gini coefficient lacks one desirable property of the population Gini coefficient, namely that "cloning" a vector has the same Gini coefficient; though cloning holds to very high accuracy with the sample Gini coefficient and can easily be recovered by a rescaling.
If sorting the input data is too much expense for a sparsity measure (is it going to be perfect anyway?), consider calculating the Hoyer sparsity instead.
The Hoyer sparsity measures a normalized ratio of the ℓ1 and ℓ2 norms. As the name suggests, it is used to measure the sparsity of an expansion in some basis.
The Hoyer sparsity computes (√N - ℓ1(v)/ℓ2(v))/(√N -1). For details, see Hoyer as well as Hurley and Rickard.
A few special cases will serve to clarify the intended use: If v has only one nonzero coefficient, the Hoyer sparsity attains its maxima of 1. If the coefficients of v all have the same magnitude, then the Hoyer sparsity attains its minima of zero. If the elements of v are uniformly distributed on an interval [0, b], then the Hoyer sparsity is approximately 0.133.
Usage:
std::vector<Real> v{1,0,0}; Real hs = boost::math::statistics::hoyer_sparsity(v); // hs = 1 std::vector<Real> v{1,-1,1}; Real hs = boost::math::statistics::hoyer_sparsity(v.begin(), v.end()); // hs = 0
The container must be forward iterable and the contents are not modified. Accepts real, complex, and integer inputs. If the input is an integral type, the output is a double precision float.
The function oracle_snr
computes
the ratio ‖ s ‖22 / ‖ s
- x ‖22, where s is signal
and x is a noisy signal. The function oracle_snr_db
computes 10log
10(‖
s ‖2 / ‖ s - x
‖2). The functions are so named because in general, one does not know
how to decompose a real signal x into s
+ w and as such s is regarded as
oracle information. Hence this function is mainly useful for unit testing other
SNR estimators.
Usage:
std::vector<double> signal(500, 3.2); std::vector<double> noisy_signal(500); // fill 'noisy_signal' signal + noise double snr_db = boost::math::statistics::oracle_snr_db(signal, noisy_signal); double snr = boost::math::statistics::oracle_snr(signal, noisy_signal);
The input can be real, complex, or integral. Integral inputs produce double
precision floating point outputs. The input data is not modified and must satisfy
the requirements of a RandomAccessContainer
.
Estimates the SNR of a noisy signal via the M2M4 method. See Pauluzzi and N.C. Beaulieu and Matzner and Englberger for details.
std::vector<double> noisy_signal(512); // fill noisy_signal with data contaminated by Gaussian white noise: double est_snr_db = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal);
The M2M4 SNR estimator is an "in-service" estimator, meaning that the estimate is made using the noisy, data-bearing signal, and does not require a background estimate. This estimator has been found to be work best between roughly -3 and 15db, tending to overestimate the noise below -3db, and underestimate the noise above 15db. See Xue et al for details.
The M2M4 SNR estimator, by default, assumes that the kurtosis of the signal is 1 and the kurtosis of the noise is 3, the latter corresponding to Gaussian noise. These parameters, however, can be overridden:
std::vector<double> noisy_signal(512); // fill noisy_signal with the data: double signal_kurtosis = 1.5; // Noise is assumed to follow Laplace distribution, which has kurtosis of 6: double noise_kurtosis = 6; double est_snr = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal, signal_kurtosis, noise_kurtosis);
Now, technically the method is a "blind SNR estimator", meaning that the no a-priori information about the signal is required to use the method. However, the performance of the method is vastly better if you can come up with a better estimate of the signal and noise kurtosis. How can we do this? Suppose we know that the SNR is much greater than 1. Then we can estimate the signal kurtosis simply by using the noisy signal kurtosis. If the SNR is much less than one, this method breaks down as the noisy signal kurtosis will tend to the noise kurtosis-though in this limit we have an excellent estimator of the noise kurtosis! In addition, if you have a model of what your signal should look like, you can precompute the signal kurtosis. For example, sinusoids have a kurtosis of 1.5. See here for a study which uses estimates of this sort to improve the performance of the M2M4 estimator.
Nota bene: The traditional definition of SNR is not mean invariant. By this we mean that if a constant is added to every sample of a signal, the SNR is changed. For example, adding DC bias to a signal changes its SNR. For most use cases, this is really not what you intend; for example a signal consisting of zeros plus Gaussian noise has an SNR of zero, whereas a signal with a constant DC bias and random Gaussian noise might have a very large SNR.
The M2M4 SNR estimator is computed from mean-invariant quantities, and hence it should really be compared to the mean-invariant SNR.
Nota bene: This computation requires the solution of a system of quadratic equations involving the noise kurtosis, the signal kurtosis, and the second and fourth moments of the data. There is no guarantee that a solution of this system exists for all value of these parameters, in fact nonexistence can easily be demonstrated for certain data. If there is no solution to the system, then failure is communicated by returning NaNs. This happens distressingly often; if a user is aware of any blind SNR estimators which do not suffer from this drawback, please open a github ticket and let us know.
The author has not managed to fully characterize the conditions under which a real solution with S > 0 and N >0 exists. However, a very intuitive example demonstrates why nonexistence can occur. Suppose the signal and noise kurtosis are equal. Then the method has no way to distinguish between the signal and the noise, and the solution is non-unique.