The judge in the Barr Decision made the statement that "averaging conceals variation." That statement was true, as far as
it went, but the purpose of averaging is to seek the central value of a normal distribution, and the measurement of variation
should be made using the standard deviation. This point appears to have been misunderstood by the regulators who prepared
the OOS draft guidance.
The only concession to the use of standard deviations seems to be two statements in the draft guidance document.
6 In the section on averaging, the draft guidance notes that when measuring content uniformity, the analyst should also report
the standard deviation of the results. The same section also says, "Unexpected variation in replicate determinations should
trigger investigation and documentation requirements." This statement suggests that standard deviations should be monitored
and a specification set so that the expected level of variation will be known, and actions taken in the face of an unusual
level of variation.
This idea that a standard deviation might be used to detect excessive variation was not mentioned in the Barr Decision. However,
the requirement that all individual test results used to calculate an average must also meet the specification individually
was retained from the Barr Decision. This requirement clearly showed that the distinction between averages and single test
results was not understood, despite the fact that it is one of the most basic ideas of statistics. The document also failed
to consider the fact that a mean and the standard error of that mean could be within a specification, even if some of the
individual results that comprise the mean could lie outside of the specification range.
When Averaging is Justified
Many of the problems from OOS results appear to arise from confusion between Barr's practices regarding content and blend
uniformity tests and situations in which averaging is justified. Averaging is justified when the analyst has a good reason
to believe that all test results should be identical. For instance, aliquots taken from a large, well-mixed solution may be
assumed to be identical. On the other hand, when performing content and blend uniformity testing, the assumption must be that
test results will not be uniform and that the lot uniformity must be proven. In such cases, averaging is not justified unless there are also tight
limits on the standard deviation. When test procedures are developed, the test developer must state the reasons for believing
that the test aliquots will be uniform to justify averaging the results. Otherwise, averaging should not be conducted.
The Barr Decision and subsequent FDA rules about handling OOS test results also had an impact on the use of outlier tests.
The judge in the Barr case ruled that since the United States Pharmacopoeia (USP) mentioned the use of the outlier test in conjunction with biological tests but not with chemical tests, it could be used
with biological tests but not with chemical tests. Given that outlier tests are well established in the theory of statistics,
which is a branch of applied mathematics, this was tantamount to stating that since the USP does not specifically mention geometry, one could not use geometric considerations in pharmaceutical calculations.7 The judge apparently believed that the application of mathematics and natural laws is subject to judicial restrictions.
The USP quickly took action to include chemical tests in the auspices of the outlier test, but the FDA's acceptance of the
judge's ruling showed a remarkable level of prejudice against the outlier test procedure. Outlier testing is widely used and
accepted in diverse fields of science and technology. The formulas used for outlier testing have a firm foundation in the
mathematical framework of statistics, provided that the underlying hypothesis of the test is affirmed. This hypothesis is
that the outlier is a member of a second population of test results that contaminates the set of observations that are supposedly
from a first population.
It appeared that FDA felt that the use of outlier testing would make it too easy to discard an OOS test result. This was a
misrepresentation and misinterpretation of the outlier test. Although in some industries outlier testing is used to discard
unusual individual observations, in the case of pharmaceutical test results, the detection of an outlier must result in an
investigation into its cause. Also, statistical theory says that finding a true outlier must be an infrequent event. The frequent
finding of outliers must cast doubt upon the specificity of a testing procedure or raise questions about the appropriate use
of the test, as this suggests that contamination of test results with results from a second population is a frequent event.