Avoiding the Pain of Out-of-Specification Results

Published on: 
BioPharm International, BioPharm International-06-01-2008, Volume 21, Issue 6
Pages: 40–45

Improving your quality operations by using sound statistical principles.


The pharmaceutical industry does not have a good track record of applying sound statistical principles to the investigation of out-of-specification (OOS) results. Recently, Steven Kuwahara presented an article on the history of the OOS problem highlighting some statistical deficiencies.1 In this article, we present some additional statistical principles. Failure to apply these principles causes unnecessary investigations, nearly guarantees recurrence, and wastes valuable time and resources. Control charts can help scientists and managers avoid these pitfalls.

In many factories and laboratories, the prevalent mindset is that if the test result under examination remains within specifications, not only does nothing more need to be said about that result, but also that nothing more should be said about it. This fundamental error leads to statistically significant signals that ought to warn of impending trouble being ignored. In turn, these overlooked or disregarded signals and disturbances eventually result in out-of-specification (OOS) results that should have been avoided. They also lead to an unnecessary increase in variation that, in accordance with Little's Law, damages not only quality, but also productivity.2 Any approach that blinds people to statistically valid signals and warnings is poor science. Control charts, which is a method of separating random from non-random variation, can help scientists and managers avoid these pitfalls.


When a product batch is tested and is found to be out of specifications, the prevailing practice in the pharmaceutical industry is to study the faulty batch—independent of previous batches—to determine the cause of the OOS result. This can often lead to erroneous causal relationships, an incorrect determination of the root cause of the OOS result, and corrective action that almost guarantees future recurrences. The result is an increase in operational and compliance costs.

Driving this approach to OOS testing is the "go, no-go" mindset so prevalent in the industry. One real problem with the "go, no-go" mindset is that an OOS result must be recorded before any action is taken. Scientists tend to ignore the statistical signals.

The top chart in Figure 1 shows the batch results against specifications and the lower chart is a Shewhart control chart (the moving ranges chart is omitted for clarity).

Figure 1. Two views of batch release results: batch release charts with specification and control limits. The top chart is the batch release chart versus specifications. The bottom chart is the same data as a Shewhart control chart.

On two occasions, batches have tested as OOS. Because this is an isolated event, the first OOS batch (batch 18) and the circumstances leading to its production should be studied to determine the root cause of the problem and to take corrective action.

However, to take similar action on the second OOS batch (batch 61) would be a mistake. Certainly, it is OOS, and an investigation must take place. Although batch 61 reported as OOS, the actual change to the process occurred at batch 41. The control chart shows two systems with a shift upwards in the process mean at batch 41. The second OOS result is at a random point in a stable system. Studying that batch alone will not reveal the true cause of the failure.

To understand what changed and when, it is better to use a control chart. In this case, after the rise in the process mean at batch 41, it was only a matter of time until a random point fell beyond specifications. There are no statistically significant differences for any of the data after batch 41, including batch 61, which failed release testing.


Figure 2 shows data from a chemical cleave step in a biologic process.2 The top chart shows the data with limits set at plus or minus three standard deviations. No points are observed beyond these 3 σ limits. Assuming the data are reasonably continuous, less than 1% of any data set can be expected to fall beyond 3 σ limits, regardless of how unstable the system may be, and irrespective of the functional form of the distribution involved. This is a central problem associated with the use of 3 σ limits. As the process is upset by special causes of variation or by drifts in the process mean, the 3 σ limits will grow without bound, making the process under examination look stable even when it is not.3

Figure 2. Two views of chemical cleave data: batch release charts set at three sigma and using Shewhart limits. The top chart is a run chart with specifications set at plus or minus three sigma. The bottom charts are the same data with Shewhart control limits set and process shifts incorporated (moving range chart is omitted from the Shewhart chart).


The lower charts in Figure 2 show control limits calculated and placed according to Shewhart's methods, and much more information is available. It is important to note that in an individual point and moving range control chart, all the control limits are developed from the average of the moving ranges, rather than from the individual data themselves. It is the special relationship between the moving ranges chart and the control limits that allow control limits to be developed that are not significantly widened by upsets to the system, as so often happens with 3 σ limits.2 The estimate for σ using Shewhart's methods is nearly half of that calculated by the more usual root mean square deviation function found in calculators and spreadsheets.

The process in Figure 2 is unstable. It is not behaving in a predictable manner and sooner or later will record an OOS event. The chart created using Shewhart's methods exposes several special causes and changes to the process mean. These signals provide clues for detecting changes to both manufacturing and analytical processes in an attempt to reduce variation and improve quality and productivity. In this case, the Shewhart chart has identified where something changed the process average, and where upsets have introduced special cause events. Investigating these signals will lead to a better understanding of the process, and to proper identification of causality as well as the appropriate corrective action.

The more traditional "average plus or minus three sigma" chart found no signals in the data. The Shewhart control chart found seven.

The approach pioneered by Shewhart suggests that key variables be tracked in real time on a control chart.3 In this way, disturbances or changes to the process can be identified and corrected, often before an OOS result is recorded.


One common analytical issue concerns effective comparison of the data against specifications. This becomes tricky when replication is used to combat analytical variability. According to FDA's guidance on Investigating Out-of-Specification (OOS) Test Results for Pharmaceutical Production, all OOS results must be investigated.4 FDA defines OOS results as, "all test results that fall outside the specifications or acceptance criteria established in drug applications, drug master files, official compendia, or by the manufacturer."4 It is acceptable to average multiple results when the sample is homogeneous or when the assay has a large amount of variability. It is not acceptable to average multiple results when testing for content uniformity or when the averaging will hide the variability between individual results in an investigation. Replicates must not be compared to the specification; only the final reportable result, generated by averaging all replicates that meet the acceptable limit for variability, should be compared to the specification. Consider the following set of analytical data for a representative assay method. This is a high-performance liquid chromatography (HPLC)-based method that determines the concentration of a therapeutic protein. As part of a variability reduction effort, it was determined that weighing was a significant source of variability (because of material hygroscopicity) along with the HPLC columns. Because the resultant concentration was used as part of the formulation process, it was important to reduce the variability to ensure a robust formulation process. Therefore, the analytical control strategy was:

  • Three replicate weighings; each was solubilized individually to make a set of three solutions.

  • Each solution was injected once on two separate HPLC units.

  • All six replicates (three from each HPLC unit) were averaged together to get a single result.

The specification was applied to the final reportable result, but not to the individual replicates. (Note: the assay was not used for content uniformity.) This was the correct application of statistical principles.


Analytical variability is a fact. It is impossible for two things to be identical. Ultimately, they may be similar enough so that the differences between them are rendered meaningless, but they are still different. No two HPLCs ever generate the same results; no two analysts ever prepare a sample the same way. In this world of variation, replication is used to reduce the analytical variability in the data generated to assist the production unit with generating quality product.

For the case studies listed below, let us apply a specification of not less than (NLT) 25.0; OOS results will be indicated in bold. In each case study, a set of replicates for the assay will be shown. The more replicates generated, the closer the approximation should be to the true value.

Case #1. "True" average concentration is 25.1. Using the random number generator in Microsoft Excel, the following six replicates were generated:

Replicate #1 (HPLC #1) → 25.040

Replicate #2 (HPLC #1) → 25.188

Replicate #3 (HPLC #1) → 24.937

Replicate #4 (HPLC #2) → 25.129

Replicate #5 (HPLC #2) → 25.086

Replicate #6 (HPLC #2) → 25.056

Final value (rounded to the spec) → 25.1

%RSD → 0.34%

As seen in the data, Replicate #3 on HPLC #1 was OOS, despite a low relative standard deviation (%RSD) of 0.34%. (The actual %RSD of the assay was closer to 0.68% as determined by re-evaluation data. An OOS investigation into this result would be a waste of time because the OOS data were actually just a random event for a lot with a potency value close to the specification. Based on the analytical variability, it was likely that one of the replicates would be outside of the specification. Further, the point of the assay control strategy was to reduce the analytical variability by averaging across the critical sources of variability.

Case #2. "True" average concentration is 24.9. Using the random number generator in Microsoft Excel, the following six replicates were generated:

Replicate #1 (HPLC #1) → 24.806

Replicate #2 (HPLC #1) → 24.964

Replicate #3 (HPLC #1) → 25.032

Replicate #4 (HPLC #2) → 24.757

Replicate #5 (HPLC #2) → 24.702

Replicate #6 (HPLC #2) → 25.151

Final value (rounded to the spec) → 24.9

%RSD → 0.70%

In this case, the "true" value is OOS. However, only three of the replicates (Replicate #1 on HPLC #1 and Replicates #4 and #5, which were run on HPLC #2) are OOS despite the "true" value and the average being OOS. Once the result is rounded appropriately, the result meets the specification.

Case #3. An outlier has been substituted into the data set. The following six replicates were generated (except for the first data point):

Replicate #1 (HPLC #1) → 24.800

Replicate #2 (HPLC #1) → 25.286

Replicate #3 (HPLC #1) → 25.377

Replicate #4 (HPLC #2) → 25.218

Replicate #5 (HPLC #2) → 25.477

Replicate #6 (HPLC #2) → 25.306

Final value (rounded to the spec) → 25.2

% RSD → 0.91%

Although not obvious, Replicate #1 on HPLC #1 is an outlier as shown by the Grubbs' test (p < 0.05). An investigation into this data point may be warranted in this instance. The statistical evidence suggests that something is different about this data point. However, the investigation is only initiated if there is a %RSD criterion (or another criterion such as a confidence or tolerance interval) in the system suitability to address these occurrences. If no %RSD criterion existed, the justification to investigate should be based on the fact that the data point was different from the others in the population, not that the data point was OOS. The root cause investigation would need to determine and address the source of the aberrant data and probably implement a %RSD criterion.


In cases where the "true" result approaches the specification limit, the possibility of generating a replicate result that is OOS increases significantly. Even with precision measurement, individual replicates can be OOS. Conversely, for a "true" OOS result, individual replicates can be within specification. Replication enables the analyst to get closer to the estimation of truth. Therefore, it makes sense to apply the specifications once all of the replicate results have been averaged for a final result.

Release limits can be used to deter these instances. Release limits are internal limits designed to ensure a lot will meet specifications through expiry by taking into account the analytical variability (and its associated uncertainty) and the stability of the molecule (and its associated uncertainty) by calculating acceptable criteria.

By focusing investigations in the right places—process shifts and other statistically valid signals versus single point OOS results—better corrective and preventative actions can be implemented to improve quality and allow for better use of resources.

Brian K. Nunnally, PhD, is an associate director and Deedra F. Nunnally is a laboratory manager, both at Wyeth, Sanford, NC, nunnalb@wyeth.com 919.566.4772. John S. McConnell is a consultant at Wysowl Pty Ltd, Warner, Australia.


1. Kuwahara SS. A history of the OOS problem. Biopharm Int. 2007; 20(11):42–52.

2. Nunnally BK, McConnell, JS. Six Sigma in the pharmaceutical industry. Boca Raton: CRC Press; 2007.

3. Shewhart WA; Statistical method from the viewpoint of quality control. The Graduate School of Agriculture: Washington, DC; 1939.

4. USFDA. Guidance for industry: Investigating out-of-specification (OOS) test results for pharmaceutical production. Rockville, MD; 2006 Oct.