...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
#include <boost/math/distributions/students_t.hpp>
namespace boost{ namespace math{ template <class RealType = double, class Policy = policies::policy<> > class students_t_distribution; typedef students_t_distribution<> students_t; template <class RealType, class Policy> class students_t_distribution { typedef RealType value_type; typedef Policy policy_type; // Constructor: students_t_distribution(const RealType& v); // Accessor: RealType degrees_of_freedom()const; // degrees of freedom estimation: static RealType find_degrees_of_freedom( RealType difference_from_mean, RealType alpha, RealType beta, RealType sd, RealType hint = 100); }; }} // namespaces
Student's t-distribution is a statistical distribution published by William Gosset in 1908. His employer, Guinness Breweries, required him to publish under a pseudonym (possibly to hide that they were using statistics to improve beer quality), so he chose "Student".
Given N independent measurements, let
where M is the population mean, μ is the sample mean, and s is the sample variance.
Student's t-distribution is defined as the distribution of the random variable t which is - very loosely - the "best" that we can do while not knowing the true standard deviation of the sample. It has the PDF:
The Student's t-distribution takes a single parameter: the number of degrees of freedom of the sample. When the degrees of freedom is one then this distribution is the same as the Cauchy-distribution. As the number of degrees of freedom tends towards infinity, then this distribution approaches the normal-distribution. The following graph illustrates how the PDF varies with the degrees of freedom ν:
students_t_distribution(const RealType& v);
Constructs a Student's t-distribution with v degrees of freedom.
Requires v > 0, including infinity (if RealType permits), otherwise calls domain_error. Note that non-integral degrees of freedom are supported, and are meaningful under certain circumstances.
RealType degrees_of_freedom()const;
returns the number of degrees of freedom of this distribution.
static RealType find_degrees_of_freedom( RealType difference_from_mean, RealType alpha, RealType beta, RealType sd, RealType hint = 100);
returns the number of degrees of freedom required to observe a significant result in the Student's t test when the mean differs from the "true" mean by difference_from_mean.
The difference between the true mean and the sample mean that we wish to show is significant.
The maximum acceptable probability of rejecting the null hypothesis when it is in fact true.
The maximum acceptable probability of failing to reject the null hypothesis when it is in fact false.
The sample standard deviation.
A hint for the location to start looking for the result, a good choice for this would be the sample size of a previous borderline Student's t test.
Note | |
---|---|
Remember that for a two-sided test, you must divide alpha by two before calling this function. |
For more information on this function see the NIST Engineering Statistics Handbook.
All the usual non-member accessor functions that are generic to all distributions are supported: Cumulative Distribution Function, Probability Density Function, Quantile, Hazard Function, Cumulative Hazard Function, mean, median, mode, variance, standard deviation, skewness, kurtosis, kurtosis_excess, range and support.
The domain of the random variable is [-∞, +∞].
Various worked examples are available illustrating the use of the Student's t distribution.
The normal distribution is implemented in terms of the incomplete beta function and its inverses, refer to accuracy data on those functions for more information.
In the following table v is the degrees of freedom of the distribution, t is the random variate, p is the probability and q = 1-p.
Function |
Implementation Notes |
---|---|
|
Using the relation: pdf = (v / (v + t2))(1+v)/2 / (sqrt(v) * beta(v/2, 0.5)) |
cdf |
Using the relations: p = 1 - z iff t > 0 p = z otherwise where z is given by: ibeta(v / 2, 0.5, v / (v + t2)) / 2 iff v < 2t2 ibetac(0.5, v / 2, t2 / (v + t2) / 2 otherwise |
cdf complement |
Using the relation: q = cdf(-t) |
quantile |
Using the relation: t = sign(p - 0.5) * sqrt(v * y / x) where: x = ibeta_inv(v / 2, 0.5, 2 * min(p, q)) y = 1 - x The quantities x and y are both returned by ibeta_inv without the subtraction implied above. |
quantile from the complement |
Using the relation: t = -quantile(q) |
mode |
0 |
mean |
0 |
variance |
if (v > 2) v / (v - 2) else NaN |
skewness |
if (v > 3) 0 else NaN |
kurtosis |
if (v > 4) 3 * (v - 2) / (v - 4) else NaN |
kurtosis excess |
if (v > 4) 6 / (df - 4) else NaN |
If the moment index k is less than v,
then the moment is undefined. Evaluating the moment will throw a domain_error
unless ignored by a policy, when it will return std::numeric_limits<>::quiet_NaN();
(By popular demand, we now support infinite argument and random deviate. But we have not implemented the return of infinity as suggested by Wikipedia Student's t, instead throwing a domain error or return NaN. See also https://svn.boost.org/trac/boost/ticket/7177.)