...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
#include <boost/math/statistics/ljung_box.hpp> namespace boost::math::statistics { template<class RandomAccessIterator> std::pair<Real, Real> ljung_box(RandomAccessIterator begin, RandomAccessIterator end, int64_t lags = -1, int64_t fit_dof = 0); template<class RandomAccessContainer> auto ljung_box(RandomAccessContainer const & v, int64_t lags = -1, int64_t fit_dof = 0); }
The Ljung-Box test is used to test if residuals from a fitted model have unwanted autocorrelation. If autocorrelation exists in the residuals, then presumably a model with more parameters can be fitted to the original data and explain more of the structure it contains.
The test statistic is
where n is the length of v and ℓ is the number of lags.
The variance of the statistic slightly exceeds the variance of the chi squared distribution, but nonetheless it still is a fairly good test with reasonable computational cost.
An example use is given below:
#include <vector> #include <random> #include <iostream> #include <boost/math/statistics/ljung_box.hpp> using boost::math::statistics::ljung_box; std::random_device rd; std::normal_distribution<double> dis(0, 1); std::vector<double> v(8192); for (auto & x : v) { x = dis(rd); } auto [Q, p] = ljung_box(v); // Possible output: Q = 5.94734, p = 0.819668
Now if the result is clearly autocorrelated:
for (size_t i = 0; i < v.size(); ++i) { v[i] = i; } auto [Q, p] = ljung_box(v); // Possible output: Q = 81665.1, p = 0
By default, the number of lags is taken to be the logarithm of the number of samples, so that the default complexity is [bigO](n ln n). If you want to calculate a given number of lags, use the second argument:
int64_t lags = 10; auto [Q, p] = ljung_box(v,10);
Finally, it is sometimes relevant to specify how many degrees of freedom were used in creating the model from which the residuals were computed. This does not affect the test statistic Q, but only the p-value. If you need to specify the number of degrees of freedom, use
int64_t fit_dof = 2; auto [Q, p] = ljung_box(v, -1, fit_dof);
For example, if you fit your data with an ARIMA(p, q)
model, then fit_dof =
p + q
.