...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Imagine you have an event (let's call it a "failure" - though
we could equally well call it a success if we felt it was a 'good' event)
that you know will occur in 1 in N trials. You may want to know how many
trials you need to conduct to be P% sure of observing at least k such
failures. If the failure events follow a negative binomial distribution
(each trial either succeeds or fails) then the static member function
negative_binomial_distibution<>::find_minimum_number_of_trials
can be used to estimate the minimum number of trials required to be P%
sure of observing the desired number of failures.
The example program neg_binomial_sample_sizes.cpp demonstrates its usage.
It centres around a routine that prints out a table of minimum sample sizes (number of trials) for various probability thresholds:
void find_number_of_trials(double failures, double p);
First define a table of significance levels: these are the maximum acceptable probability that failure or fewer events will be observed.
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
Confidence value as % is (1 - alpha) * 100, so alpha 0.05 == 95% confidence that the desired number of failures will be observed. The values range from a very low 0.5 or 50% confidence up to an extremely high confidence of 99.999.
Much of the rest of the program is pretty-printing, the important part is in the calculation of minimum number of trials required for each value of alpha using:
(int)ceil(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i]);
find_minimum_number_of_trials returns a double, so ceil
rounds this up to ensure we have an integral minimum number of trials.
void find_number_of_trials(double failures, double p) { // trials = number of trials // failures = number of failures before achieving required success(es). // p = success fraction (0 <= p <= 1.). // // Calculate how many trials we need to ensure the // required number of failures DOES exceed "failures". cout << "\n""Target number of failures = " << (int)failures; cout << ", Success fraction = " << fixed << setprecision(1) << 100 * p << "%" << endl; // Print table header: cout << "____________________________\n" "Confidence Min Number\n" " Value (%) Of Trials \n" "____________________________\n"; // Now print out the data for the alpha table values. for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i) { // Confidence values %: cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]) << " " // find_minimum_number_of_trials << setw(6) << right << (int)ceil(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i])) << endl; } cout << endl; } // void find_number_of_trials(double failures, double p)
finally we can produce some tables of minimum trials for the chosen confidence levels:
int main() { find_number_of_trials(5, 0.5); find_number_of_trials(50, 0.5); find_number_of_trials(500, 0.5); find_number_of_trials(50, 0.1); find_number_of_trials(500, 0.1); find_number_of_trials(5, 0.9); return 0; } // int main()
Note | |
---|---|
Since we're calculating the minimum number of
trials required, we'll err on the safe side and take the ceiling of
the result. Had we been calculating the maximum
number of trials permitted to observe less than a certain number of
failures then we would have taken the floor instead.
We would also have called floor(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i])) which would give us the largest number of trials we could conduct and still be P% sure of observing failures or less failure events, when the probability of success is p. |
We'll finish off by looking at some sample output, firstly suppose we wish to observe at least 5 "failures" with a 50/50 (0.5) chance of success or failure:
Target number of failures = 5, Success fraction = 50% ____________________________ Confidence Min Number Value (%) Of Trials ____________________________ 50.000 11 75.000 14 90.000 17 95.000 18 99.000 22 99.900 27 99.990 31 99.999 36
So 18 trials or more would yield a 95% chance that at least our 5 required failures would be observed.
Compare that to what happens if the success ratio is 90%:
Target number of failures = 5.000, Success fraction = 90.000% ____________________________ Confidence Min Number Value (%) Of Trials ____________________________ 50.000 57 75.000 73 90.000 91 95.000 103 99.000 127 99.900 159 99.990 189 99.999 217
So now 103 trials are required to observe at least 5 failures with 95% certainty.