The statistically trained analyst tries to understand these variations by testing several replicates to obtain an average
(mean, x-mean) and a standard deviation (s). The standard deviation is a measure of the variation of the test results and
may be expressed as a percentage of the mean (100 * s/x-mean); this is called a coefficient of variation (CV). A small CV is believed to represent a test that is more precise than a test with a large CV. The usual procedure is
to create specifications that allow for test results to vary in a range of two standard deviations about the mean. Statistically,
this creates a range that captures 95% of expected test results. The problem lies with the remaining 5%.
Approximately 5% of the time, a confluence of random events can occur, resulting in a test result that is outside the 95%
range of the specification. This situation could occur as a result of what is known as extreme statistical variation, even if nothing is wrong with the product lot or process. In rare instances, extreme statistical variation could produce
test results indicating that an entire product lot is OOS; more commonly, however, the OOS result is associated with a single
To lessen the effect of extreme statistical variation, most QC analysts make their measurements using replicates. The averaging
of replicates is a method for controlling the effects of variability.1 The procedures for calculating the number of replicates required for a given level of risk and test method variation have
been known for a long time.2
It is also well known that when using a 95% interval for a specification range, the 5% possibility of obtaining an OOS result
(even on an acceptable product, because of extreme statistical variation) can be dealt with by repeating the test on the same
sample or set of samples. The idea of this approach is that if an OOS test result was caused by an extreme statistical variation
that occurred 5% of the time, then there was a 95% probability that the retest would not show the effects of this extreme
variation. To a certain extent, this procedure led to what FDA has called "reflexive retesting." But this method, when used
properly, is a valid approach. Only when abused does it truly become "reflexive retesting."
THE FALLACY OF TESTING INTO COMPLIANCE AND THE OOS PROBLEM
The OOS problem did not arise from reflexive retesting, however, but rather from an incorrect extension of the procedure,
which led to "testing into compliance," essentially a result of the fact that management hates to reject a batch.
The process of "testing into compliance" resulted from a reversal of the thinking that originally led to reflexive retesting.
In "testing into compliance," an unethical manufacturer hopes that even a bad lot will produce a passing test result, as a
result of extreme statistical variation. Consequently, failing test results are ignored and retests are ordered until extreme
variation produces a passing test result. The passing result is accepted and the lot is released based on that result. Instances
where seven to eight retests were ordered in an attempt to obtain a passing result are known.
If a company actually believes that such a passing result shows that the product is of good quality, it is engaging in fallacious
reasoning. There is no reason to believe that the results obtained from the retests are really different from the original
test result. In fact, a result from a retest should be nothing more than another member of the population of test results
that are generated by random variation. The fact that a passing test result is pleasing to management does not make it more
valid than a result that indicates a failure to meet a specification. From a statistical point of view, as long as these results
arise from properly performed tests on the same sample, they are all members of a population of test results, each of which
represents a legitimate estimate of the property specified.
The Proper Use of Retesting
There is a legitimate and proper way to use retesting, however. If a test shows that a batch is OOS, the trained QC analyst
considers three possibilities:
1. Has process variation created a whole lot that is OOS?
2. Has process variation created a single sample that is OOS?
3. Did the OOS test result occur because the test was performed incorrectly?