Estimating Sample Sizes for the Negative Binomial.

Imagine you have an event (let's call it a "failure" - though we could equally well call it a success if we felt it was a 'good' event) that you know will occur in 1 in N trials. You may want to know how many trials you need to conduct to be P% sure of observing at least k such failures. If the failure events follow a negative binomial distribution (each trial either succeeds or fails) then the static member function negative_binomial_distibution<>::find_minimum_number_of_trials can be used to estimate the minimum number of trials required to be P% sure of observing the desired number of failures.

The example program neg_binomial_sample_sizes.cpp demonstrates its usage.

It centres around a routine that prints out a table of minimum sample sizes (number of trials) for various probability thresholds:

void find_number_of_trials(double failures, double p);

First define a table of significance levels: these are the maximum acceptable probability that failure or fewer events will be observed.

double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };

Confidence value as % is (1 - alpha) * 100, so alpha 0.05 == 95% confidence that the desired number of failures will be observed. The values range from a very low 0.5 or 50% confidence up to an extremely high confidence of 99.999.

Much of the rest of the program is pretty-printing, the important part is in the calculation of minimum number of trials required for each value of alpha using:

(int)ceil(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i]);

find_minimum_number_of_trials returns a double, so ceil rounds this up to ensure we have an integral minimum number of trials.

void find_number_of_trials(double failures, double p)
{
   // trials = number of trials
   // failures = number of failures before achieving required success(es).
   // p        = success fraction (0 <= p <= 1.).
   //
   // Calculate how many trials we need to ensure the
   // required number of failures DOES exceed "failures".

  cout << "\n""Target number of failures = " << (int)failures;
  cout << ",   Success fraction = " << fixed << setprecision(1) << 100 * p << "%" << endl;
   // Print table header:
   cout << "____________________________\n"
           "Confidence        Min Number\n"
           " Value (%)        Of Trials \n"
           "____________________________\n";
   // Now print out the data for the alpha table values.
  for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
   { // Confidence values %:
      cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]) << "      "
      // find_minimum_number_of_trials
      << setw(6) << right
      << (int)ceil(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i]))
      << endl;
   }
   cout << endl;
} // void find_number_of_trials(double failures, double p)

finally we can produce some tables of minimum trials for the chosen confidence levels:

int main()
{
    find_number_of_trials(5, 0.5);
    find_number_of_trials(50, 0.5);
    find_number_of_trials(500, 0.5);
    find_number_of_trials(50, 0.1);
    find_number_of_trials(500, 0.1);
    find_number_of_trials(5, 0.9);

    return 0;
} // int main()

Note

	Note
Since we're calculating the minimum number of trials required, we'll err on the safe side and take the ceiling of the result. Had we been calculating the maximum number of trials permitted to observe less than a certain number of failures then we would have taken the floor instead. We would also have called `find_minimum_number_of_trials` like this: floor(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i])) which would give us the largest number of trials we could conduct and still be P% sure of observing failures or less failure events, when the probability of success is p.

Since we're calculating the minimum number of trials required, we'll err on the safe side and take the ceiling of the result. Had we been calculating the maximum number of trials permitted to observe less than a certain number of failures then we would have taken the floor instead. We would also have called find_minimum_number_of_trials like this:

floor(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i]))

which would give us the largest number of trials we could conduct and still be P% sure of observing failures or less failure events, when the probability of success is p.

We'll finish off by looking at some sample output, firstly suppose we wish to observe at least 5 "failures" with a 50/50 (0.5) chance of success or failure:

Target number of failures = 5,   Success fraction = 50%

____________________________
Confidence        Min Number
 Value (%)        Of Trials
____________________________
    50.000          11
    75.000          14
    90.000          17
    95.000          18
    99.000          22
    99.900          27
    99.990          31
    99.999          36

So 18 trials or more would yield a 95% chance that at least our 5 required failures would be observed.

Compare that to what happens if the success ratio is 90%:

Target number of failures = 5.000,   Success fraction = 90.000%

____________________________
Confidence        Min Number
 Value (%)        Of Trials
____________________________
    50.000          57
    75.000          73
    90.000          91
    95.000         103
    99.000         127
    99.900         159
    99.990         189
    99.999         217

So now 103 trials are required to observe at least 5 failures with 95% certainty.

Boost C++ Libraries

Estimating Sample Sizes for the Negative Binomial.