Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world.

Estimating the Required Sample Sizes for a Chi-Square Test for the Standard Deviation

Suppose we conduct a Chi Squared test for standard deviation and the result is borderline, a legitimate question to ask is "How large would the sample size have to be in order to produce a definitive result?"

The class template chi_squared_distribution has a static method `find_degrees_of_freedom` that will calculate this value for some acceptable risk of type I failure alpha, type II failure beta, and difference from the standard deviation diff. Please note that the method used works on variance, and not standard deviation as is usual for the Chi Squared Test.

The code for this example is located in chi_square_std_dev_test.cpp.

We begin by defining a procedure to print out the sample sizes required for various risk levels:

```void chi_squared_sample_sized(
double diff,      // difference from variance to detect
double variance)  // true variance
{
```

The procedure begins by printing out the input data:

```using namespace std;
using namespace boost::math;

// Print out general info:
cout <<
"_____________________________________________________________\n"
"Estimated sample sizes required for various confidence levels\n"
"_____________________________________________________________\n\n";
cout << setprecision(5);
cout << setw(40) << left << "True Variance" << "=  " << variance << "\n";
cout << setw(40) << left << "Difference to detect" << "=  " << diff << "\n";
```

And defines a table of significance levels for which we'll calculate sample sizes:

```double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
```

For each value of alpha we can calculate two sample sizes: one where the sample variance is less than the true value by diff and one where it is greater than the true value by diff. Thanks to the asymmetric nature of the Chi Squared distribution these two values will not be the same, the difference in their calculation differs only in the sign of diff that's passed to `find_degrees_of_freedom`. Finally in this example we'll simply things, and let risk level beta be the same as alpha:

```cout << "\n\n"
"_______________________________________________________________\n"
"Confidence       Estimated          Estimated\n"
" Value (%)      Sample Size        Sample Size\n"
"                (lower one         (upper one\n"
"                 sided test)        sided test)\n"
"_______________________________________________________________\n";
//
// Now print out the data for the table rows.
//
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// calculate df for a lower single sided test:
double df = chi_squared::find_degrees_of_freedom(
-diff, alpha[i], alpha[i], variance);
// convert to sample size:
double size = ceil(df) + 1;
// Print size:
cout << fixed << setprecision(0) << setw(16) << right << size;
// calculate df for an upper single sided test:
df = chi_squared::find_degrees_of_freedom(
diff, alpha[i], alpha[i], variance);
// convert to sample size:
size = ceil(df) + 1;
// Print size:
cout << fixed << setprecision(0) << setw(16) << right << size << endl;
}
cout << endl;
```

For some example output, consider the silicon wafer data from the NIST/SEMATECH e-Handbook of Statistical Methods.. In this scenario a supplier of 100 ohm.cm silicon wafers claims that his fabrication process can produce wafers with sufficient consistency so that the standard deviation of resistivity for the lot does not exceed 10 ohm.cm. A sample of N = 10 wafers taken from the lot has a standard deviation of 13.97 ohm.cm, and the question we ask ourselves is "How large would our sample have to be to reliably detect this difference?".

To use our procedure above, we have to convert the standard deviations to variance (square them), after which the program output looks like this:

```_____________________________________________________________
Estimated sample sizes required for various confidence levels
_____________________________________________________________

True Variance                           =  100.00000
Difference to detect                    =  95.16090

_______________________________________________________________
Confidence       Estimated          Estimated
Value (%)      Sample Size        Sample Size
(lower one         (upper one
sided test)        sided test)
_______________________________________________________________
50.000               2               2
75.000               2              10
90.000               4              32
95.000               5              51
99.000               7              99
99.900              11             174
99.990              15             251
99.999              20             330
```

In this case we are interested in a upper single sided test. So for example, if the maximum acceptable risk of falsely rejecting the null-hypothesis is 0.05 (Type I error), and the maximum acceptable risk of failing to reject the null-hypothesis is also 0.05 (Type II error), we estimate that we would need a sample size of 51.

 Copyright © 2006-2010, 2012-2014 Nikhar Agrawal, Anton Bikineev, Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno Lalande, John Maddock, Johan Råde, Gautam Sewani, Benjamin Sobotta, Thijs van den Berg, Daryle Walker and Xiaogang Zhang Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)