Quantitative research questions usually deal with categorical parameters with discrete values such as yes/no, or a 5-point rating scale. These variables are characterized by the binomial distribution (provided the universe is adequately large).

Interest, however, lies mainly in the proportions. For instance — What proportion of the population are aware of brand X? What proportion of consumers claim they will buy brand X? What proportion of consumers prefer formulation X over formulation Y?

These variables pertaining to the estimation of a proportion
of the population, are characterized by one of two outcomes, (“success”=1, “fail”=0).
If the probability of success is *p*, then:

Based on the Central Limit Theorem, for a sample size of *n*, the
variance (σ²) of the distribution of sample proportion of successes, is:

The upper bound (maximum value) of sample variance (σ²) is 0.25/n and the upper bound for σ (standard error) is 0.5/√n. This occurs when p=0.5.

Substituting the population variance (S²) in the
sample size equation, the required sample size (*n*)
for estimating proportion p̄ is:

$$ n=\frac {Z^2 × p(1-p)}{e^2},\; e = Z \sqrt {\frac {p(1-p)}{n}} $$

Where:

p: is the probability for given response and varies from 0 to 1. It reflects the variability in the data. Note when p=1, (i.e. variance = 0) no sample is required.

Z: is the standardized value associated with the level of confidence.

e: is the margin of error.

The probability for which we require the largest sample size
is *p = 0.5. *For this “most conservative” value for *p*, and for a
confidence interval of 95% (Z = 1.96), we can simplify the above equation as follows:

Z = 1.96 ≈ 2 for confidence level of 95%

$$p = 0.5$$ $$n = \frac{Z^2p(1-p)}{e^2}≈\frac{2^2×0.5×(1-0.5)}{e^2}$$ $$n≈\frac{1}{e^2},\; e≈\frac{1}{\sqrt n}$$ $$n=\frac{0.96}{e^2},\; e=\frac{0.96}{\sqrt n}$$Based on this conservative projection, Exhibit 33.3 shows how sample
size *n* varies with the margin of error *e*. For example if *e = ± 5%*, then for the confidence interval
of 95%, we require a sample of n = 400. Or 384 to be precise if *Z (= 1.96)*
is not rounded off.

Note how the margin of error initially decreases sharply as the sample size,
*n*, increases, and then more gradually. The improvement in accuracy after *n=400*, is
relatively small compared to the increase in sample size.

**Example:** If 80 respondent out of a sample of 400
consumed a cola drink, we can conclude that 95 times out of 100 (confidence level of 95%), the estimate of
the proportion of cola drinkers, over the specified time period, would lie between 16.1% and 23.9% (0.2 ± 0.039):

**Example:** Exhibit 33.4 depicts the confidence interval for confidence
levels of 95% (shaded dark) and 99%. When the parameter value is 39.0, confidence interval, taking Z=1.96 and p=0.39,
is 34.2% to 43.8%).

Importantly, the shaded areas in Exhibit 33.4 reveal that many of the changes between consecutive time periods, are not statistically significant, at the specified confidence levels.

In the interest of saving costs, in cases where the universe size is not large, the sample size may be adjusted downwards. For medium size populations, the sample requirement is adjusted downwards using this formula:

$$ n_{adj} = \frac {n}{1+n/N} $$For example if *N = 2000, e = 5% *then

Additionally if it is known that the proportions are skewed (i.e. p is not close to 0.5) the sample may be further reduced.

For small universe populations, 100 for instance, it is advisable to take census instead of sample.

Note: To find content on MarketingMind type the acronym ‘MM’ followed by your query into the search bar. For example, if you enter ‘mm consumer analytics’ into Chrome’s search bar, relevant pages from MarketingMind will appear in Google’s result pages.

Is marketing education fluffy too?

Marketing simulators impart much needed combat experiences, equipping practitioners with the skills to succeed in the consumer market battleground. They combine theory with practice, linking the classroom with the consumer marketplace.