
Process Validation: Using Tolerance Intervals for Setting Process Validation Acceptance Criteria
ABSTRACT One goal of process characterization is establishing representative performance parameter ranges that can be used to set validation acceptance criteria (VAC). Characterization studies yield varying numbers of data points from multiple experiments, and may also include data generated at different scales (e.g., bench, pilot, and commercial), which add complexity to the analysis. Many statistical approaches can be used to set ranges from large data sets. As an example, we present the statistical considerations and techniques for setting validation acceptance ranges for a chromatography step used in purifying a recombinant protein. Performance parameter data from a combined data set consisting of 67 bench, six pilot, and three fullscale runs were analyzed using the statistical analysis software JMP (SAS Institute). The combined data set was used to compute tolerance intervals, so that sources such as scale and column feed material could be properly modeled. The resulting ranges were used to establish validation acceptance criteria.
Establishing appropriate validation acceptance criteria (VAC) is one of the greatest challenges in the development of a commercial biopharmaceutical manufacturing process. Setting VAC that are too broad will not enable demonstration of adequate process control. VAC that are too narrow can result in failed validation runs, even though the process may be performing adequately. If there are no representative benchscale data from process characterization studies, the data set used for a statistical analysis to establish acceptance criteria may be quite small. Yet, if both process characterization data from bench scale studies, as well as data from largescale runs are available, it may not be obvious how to combine these data sets in an appropriate way. In this article, we describe statistical methods in which benchscale process characterization data are combined with a smaller, largescale data set to establish validation acceptance criteria that are indicative of process consistency, yet are not unduly restrictive. PROCESS CHARACTERIZATION
TOLERANCE INTERVALS AS PROCESS VALIDATION CRITERIA A twosided tolerance interval is an interval thought to contain 100p% of a population with 100(1 – α)% confidence. For example, if p = 0.99 and α = 0.05, then a twosided tolerance interval will contain 99% of the population with 95% confidence. This means that the reported range is expected to include 99% of the PP values that will be generated by the process under consideration. Tolerance intervals are particularly useful for setting VAC because they describe the expected longrange behavior of the process. Tolerance intervals can be computed and used to set VAC under any of the following scenarios:
a. setpoint conditions; or as Examples of calculating tolerance intervals computed for each of these scenarios appear in the three scenarios that follow. Scenario 1: The tolerance intervals described in this section can be used when a limited data set, such as data from only largescale runs, are available for setting VAC. Wald and Wolfowitz^{4} introduced the notion of twosided tolerance intervals in the case of a random sample selected from a single population. They provided approximate formulas that were later modified by Howe.^{5} This interval contains 100p% of the population with 100(1 – α)% confidence and is defined as
Where
in which S is the sample standard deviation, Y is the sample mean, r is the error degrees of freedom, c is the number of observations used to compute the center, Y mean Z_{(p + 1)/2} is the standard normal percentile with area (p + 1)/2 to the left, and X^{2}r,α is the chisquared percentile with r degrees of freedom and area α to the left. If Equation (1) is used to compute a tolerance interval for a simple random sample of n observations, then r = n – 1 and c = n. Equation (1) has previously been recommended for setting VAC in this scenario.^{6} Tabled values for tolerance intervals are also available.^{7} Scenario 2: In this scenario, data from both benchscale process characterization and largescale are available. By combining process characterization data with largescale data, sample sizes on which tolerance intervals are based can be increased. Additionally, the modeled regression relationships between PPs and OPs provide valuable information that yield more realistic VAC limits.
In this example, as the coded value of OP shifts from –1 to +1 (where zero is the setpoint condition), the range that contains 99% of the population PP values shifts up due to the positive linear relationship between PP and OP. Note that although the centers of the intervals that include the middle 99% of the PP values differ as the OP changes, the lengths of the intervals are constant. This is because the regression model assumes the spread (standard deviation) of the PP values is constant across the examined range of the OP. (One must verify this assumption during data analysis.)
Alternative centering rules may also be considered when different lots of a key raw material were used for each of the largescale runs, but the same material (but from a different lot) was used for all of the bench scale runs. Here it might be best to center the interval on a linear combination of the largescale and benchscale means. Scenario 3: In this scenario, tolerance intervals are calculated accounting for OPs that vary across the OR. Typically, OPs will vary around the setpoint value due to instrument and equipment tolerances and other factors. Thus, a tolerance interval that describes behavior of the PP must adequately account for this variation in the OP. The formula in Equation (1) will not adequately account for the propagation of error that results from movement in the OPs. To compute the tolerance interval in this situation, a simulationbased approach is necessary. Briefly, one simulates a set of values for the OPs consistent with the expected movement of the OPs within the OR. A regression model based on characterization data is then used to predict the value of the PP for the simulated OP values. This process is repeated many times to construct an empirical distribution of the PP values. From this simulated distribution, one selects the range that covers the desired proportion of the population. A more detailed algorithm for this process is presented in the example at the end of the paper. OTHER CONSIDERATIONS IN COMPUTING TOLERANCE INTERVALS One issue of interest in any computation of a tolerance interval is the proportion of area contained in the interval and the level of confidence that the reported interval is correct. We have found that twosided intervals containing 99% (p = 99) of the population with an individual confidence level of 95% (α = 0.05) provide reasonable VAC limits. The decision to include 99% of the population is based on the desire to have limits similar conceptually to those used in process control, but not so wide as to be uninformative. In process control, limits are established to include approximately 99.7% of the data. However, tolerance intervals that cover the middle 99.7% are extremely wide for data sets of the size typically available from process characterization. The 99% coverage used in the tolerance interval represents a good compromise that provides meaningful intervals. If there are many critical and key PPs, one may choose to adjust the individual confidence levels in order to obtain a desired overall confidence level on the entire set of PPs. A simple method for handling this "multiplicity" problem is to use the Bonferroni inequality.^{8} For example, assume it is required to have VAC for 10 key and critical PPs. In order to achieve an overall confidence of at least 95% on the set of 10 PPs, individual tolerance intervals must be calculated with a confidence coefficient of: 100(1 – (0.05/10)) = 99.5%.
AN EXAMPLE APPLICATION: COANALYZING BENCH AND SCALE DATA To demonstrate the approach shown in Figure 3, we present an example for a chromatography step used in purifying a recombinant protein.
Values for the OPs are unknown for the largescale GMP and nonGMP runs, but known for the bench studies. Suppose we want to construct a tolerance interval to establish a VAC for a purity PP using only the three largescale GMP runs. Equation (1) can be used to compute a tolerance interval that contains 99% of the population with 95% confidence using the three GMP values in the data set. From the data, the sample mean of the three observations is Y mean = 89.5, the standard deviation is S = 1.68, c = 3, r = 2, α = 0.05, and p = 0.99. This provides: Z_{(p+1)/2} = Z_{(0.99+1)/2} = Z_{0.995} = 2.576 and χ^{2} _{r,a} = χ^{2} _{2,0.05} = 0.1026 so that the computed value of k _{2} is
The resulting tolerance interval is the following: 89.5 – 13.1(1.68) = 67.4 (lower limit) to This is a relatively wide interval because of the large value for k _{2} due to the small sample size. Thus, the use of benchscale process characterization data is useful for shortening this interval to a more informative width.
The root mean squared error (RMSE) in Table 3 is 1.64, which is relatively close to the standard deviation of the clinical runs (1.68). This suggests that combining the GMP data with the rest of the data set is reasonable and will provide a better estimate of the standard deviation. By combining the data, the value of r increases from 2 to 72 and the tolerance interval is shortened accordingly. Because it is desired to center at the GMP average in this example, the center of the interval is 89.5. This center estimate involves only three GMP lots, so c = 3. The value of k _{2 }using equation (1) with c = 3, r = 72, α = 0.05, and p = 0.99 is
The computed tolerance interval with the center of 89.5 and RMSE = 1.64 is from 83.8 (lower limit) to 95.2 (upper limit). Note that this interval is much tighter than the previously computed interval from 67.4 to 112. This is largely because k _{2} has decreased from 13.1 to 3.45. By making use of all the available data, a more meaningful interval has been obtained. As noted previously, it is often expected that OPs will vary around the setpoint value. Using the simulator tool in JMP 6.0, one can model this behavior and use it to construct a tolerance interval. To demonstrate this process, assume that in our example we are confident that OP1 will be fixed at setpoint, but that OP2 and OP3 will randomly drift around their setpoints, but within their respective ORs, in accordance to some specified probability distribution. The following algorithm can be used to simulate a tolerance interval based on these assumptions and the assumed regression model: 1. Simulate values of OP2 and OP3 from appropriate probability distributions. 2. Compute the predicted value of the PP using the fitted regression model for the simulated values of OP2 and OP3 and the fixed value of OP1. 3. Add a suitably chosen error term to account for uncertainty in the model fit. 4. Perform steps 1–3 a large number of times, say 100,000 times. The resulting set of 100,000 observations is an empirically derived set of PP values. Take as the tolerance interval the range that includes the middle 99% of these values. (This is the range bounded by the 0.5 and 99.5 percentiles.)
Note the distribution of the PP in Figure 5 is centered at 88.6 instead of at the desired largescale GMP mean of 89.5. Recalling that the spread of a tolerance interval is not affected by shifts in location, the interval is adjusted to the desired GMP center by taking as the lower bound 82.7 – (88.6 – 89.5) = 83.6 and as the upper bound 94.4 – (88.6 – 89.5) = 95.3.
CONCLUSION The procedure described in this paper is general enough to apply to more complex situations. In particular, it is often the case that random events such as differences in column feed material will increase the variability in a PP. The regression model can be modified to appropriately incorporate random effects, and the JMP simulator used to produce a tolerance interval under these conditions. Quadratic effects and interaction effects among the OPs are also easily incorporated into the regression model. In conclusion, we have presented approaches that yield appropriate VAC. The most appropriate technique for establishing these ranges depends on the available data. For many processes, movement by an OP within the OR is expected. Combining benchand largescale data sets, analyzed using the simulation approach presented in this paper, results in VAC that are indicative of process control, yet are not unnecessarily restrictive.
Rick Burdick is a principal quality engineer in the Quality Engineering and Improvement department at Amgen; Tom Gleason is a senior associate scientist in the Manufacturing Science and Technology department at Amgen 303.041.1432, tgleason@amgen.com REFERENCES 1. Seely JE, Seely RJ. A rational, stepwise approach to process characterization. BioPharm Int. 2003; Aug(16):2434. 2. Kieffer R, Bureau S, Borgmann A. Applications of failure modes and effects analysis to the pharmaceutical industry. Pharm Tech Eur. 1997; Sept(9):3649. 3. Stamatis DD. Failure modes and effects analysis; FMEA from theory to execution. 2nd ed. Milwaukee (WI):ASQ Quality Press; 2003. 4. Wald A, Wolfowitz J. Tolerance limits for a normal distribution. Ann of Math Stat 1946(17);208–15. 5. Howe WG. Twosided tolerance limits for normal populations, some improvements. J Am Stat Assoc. 1969(64): 610–20. 6. Orchard T. Setting acceptance criteria from statistics of the data. BioPharm Int. 2006; Nov (19):22–9. 7. Hahn GJ, Meeker WQ. Statistical intervals: a guide to practitioners. New York (NY): Wiley; 1991. 8. Neter J, Kutner MH, Nachtsheim CJ, Wasserman, W. Applied linear statistical models. 4th ed. Scarborough, Ontario (Can): Irwin; 1996.

