# Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world.

This is the documentation for an old version of boost. Click here for the latest Boost documentation.
###### Calculating Confidence Limits on the Frequency of Occurrence for a Binomial Distribution

Imagine you have a process that follows a binomial distribution: for each trial conducted, an event either occurs or does it does not, referred to as "successes" and "failures". If, by experiment, you want to measure the frequency with which successes occur, the best estimate is given simply by k / N, for k successes out of N trials. However our confidence in that estimate will be shaped by how many trials were conducted, and how many successes were observed. The static member functions `binomial_distribution<>::find_lower_bound_on_p` and `binomial_distribution<>::find_upper_bound_on_p` allow you to calculate the confidence intervals for your estimate of the occurrence frequency.

The sample program binomial_confidence_limits.cpp illustrates their use. It begins by defining a procedure that will print a table of confidence limits for various degrees of certainty:

```#include <iostream>
#include <iomanip>
#include <boost/math/distributions/binomial.hpp>

void confidence_limits_on_frequency(unsigned trials, unsigned successes)
{
//
// trials = Total number of trials.
// successes = Total number of observed successes.
//
// Calculate confidence limits for an observed
// frequency of occurrence that follows a binomial
// distribution.
//
using namespace std;
using namespace boost::math;

// Print out general info:
cout <<
"___________________________________________\n"
"2-Sided Confidence Limits For Success Ratio\n"
"___________________________________________\n\n";
cout << setprecision(7);
cout << setw(40) << left << "Number of Observations" << "=  " << trials << "\n";
cout << setw(40) << left << "Number of successes" << "=  " << successes << "\n";
cout << setw(40) << left << "Sample frequency of occurrence" << "=  " << double(successes) / trials << "\n";
```

The procedure now defines a table of significance levels: these are the probabilities that the true occurrence frequency lies outside the calculated interval:

```double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
```

Some pretty printing of the table header follows:

```cout << "\n\n"
"_______________________________________________________________________\n"
"Confidence        Lower CP       Upper CP       Lower JP       Upper JP\n"
" Value (%)        Limit          Limit          Limit          Limit\n"
"_______________________________________________________________________\n";
```

And now for the important part - the intervals themselves - for each value of alpha, we call `find_lower_bound_on_p` and `find_lower_upper_on_p` to obtain lower and upper bounds respectively. Note that since we are calculating a two-sided interval, we must divide the value of alpha in two.

Please note that calculating two separate single sided bounds, each with risk level αis not the same thing as calculating a two sided interval. Had we calculate two single-sided intervals each with a risk that the true value is outside the interval of α, then:

• The risk that it is less than the lower bound is α.

and

• The risk that it is greater than the upper bound is also α.

So the risk it is outside upper or lower bound, is twice alpha, and the probability that it is inside the bounds is therefore not nearly as high as one might have thought. This is why α/2 must be used in the calculations below.

In contrast, had we been calculating a single-sided interval, for example: "Calculate a lower bound so that we are P% sure that the true occurrence frequency is greater than some value" then we would not have divided by two.

Finally note that `binomial_distribution` provides a choice of two methods for the calculation, we print out the results from both methods in this example:

```   for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// Calculate Clopper Pearson bounds:
double l = binomial_distribution<>::find_lower_bound_on_p(
trials, successes, alpha[i]/2);
double u = binomial_distribution<>::find_upper_bound_on_p(
trials, successes, alpha[i]/2);
// Print Clopper Pearson Limits:
cout << fixed << setprecision(5) << setw(15) << right << l;
cout << fixed << setprecision(5) << setw(15) << right << u;
// Calculate Jeffreys Prior Bounds:
l = binomial_distribution<>::find_lower_bound_on_p(
trials, successes, alpha[i]/2,
binomial_distribution<>::jeffreys_prior_interval);
u = binomial_distribution<>::find_upper_bound_on_p(
trials, successes, alpha[i]/2,
binomial_distribution<>::jeffreys_prior_interval);
// Print Jeffreys Prior Limits:
cout << fixed << setprecision(5) << setw(15) << right << l;
cout << fixed << setprecision(5) << setw(15) << right << u << std::endl;
}
cout << endl;
}
```

And that's all there is to it. Let's see some sample output for a 2 in 10 success ratio, first for 20 trials:

```___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________

Number of Observations                  =  20
Number of successes                     =  4
Sample frequency of occurrence          =  0.2

_______________________________________________________________________
Confidence        Lower CP       Upper CP       Lower JP       Upper JP
Value (%)        Limit          Limit          Limit          Limit
_______________________________________________________________________
50.000        0.12840        0.29588        0.14974        0.26916
75.000        0.09775        0.34633        0.11653        0.31861
90.000        0.07135        0.40103        0.08734        0.37274
95.000        0.05733        0.43661        0.07152        0.40823
99.000        0.03576        0.50661        0.04655        0.47859
99.900        0.01905        0.58632        0.02634        0.55960
99.990        0.01042        0.64997        0.01530        0.62495
99.999        0.00577        0.70216        0.00901        0.67897
```

As you can see, even at the 95% confidence level the bounds are really quite wide (this example is chosen to be easily compared to the one in the NIST/SEMATECH e-Handbook of Statistical Methods. here). Note also that the Clopper-Pearson calculation method (CP above) produces quite noticeably more pessimistic estimates than the Jeffreys Prior method (JP above).

Compare that with the program output for 2000 trials:

```___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________

Number of Observations                  =  2000
Number of successes                     =  400
Sample frequency of occurrence          =  0.2000000

_______________________________________________________________________
Confidence        Lower CP       Upper CP       Lower JP       Upper JP
Value (%)        Limit          Limit          Limit          Limit
_______________________________________________________________________
50.000        0.19382        0.20638        0.19406        0.20613
75.000        0.18965        0.21072        0.18990        0.21047
90.000        0.18537        0.21528        0.18561        0.21503
95.000        0.18267        0.21821        0.18291        0.21796
99.000        0.17745        0.22400        0.17769        0.22374
99.900        0.17150        0.23079        0.17173        0.23053
99.990        0.16658        0.23657        0.16681        0.23631
99.999        0.16233        0.24169        0.16256        0.24143
```

Now even when the confidence level is very high, the limits are really quite close to the experimentally calculated value of 0.2. Furthermore the difference between the two calculation methods is now really quite small.

 Copyright © 2006 , 2007, 2008 John Maddock, Paul A. Bristow, Hubert Holin, Xiaogang Zhang, Bruno Lalande and Johan Råde Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)