Specification Setting: Setting Acceptance Criteria from Statistics of the Data - - BioPharm International
Specification Setting: Setting Acceptance Criteria from Statistics of the Data
 Nov 1, 2006 BioPharm International Volume 19, Issue 11

The Poisson percentages can be calculated in Excel using the formula "=POISSON(E118,F134,FALSE)." The cell reference E118 contains an integer μg/g value and F134 the parameter, μ. Excel truncates any values of X that are not integers.

The P-value of the chi-square test can be calculated in Excel by "= CHIDIST(M15,M16)." Cell M15 contains the sum of chi squares and M16 contains the degrees of freedom. The degrees of freedom are two less than the number of chi-square values in the sum. If the P-value is larger than 0.05, the Poisson distribution is a good fit to the data.

If the distribution is not a good fit, estimating the value of μ that minimizes the sum of the chi-squares can produce a better fitting distribution. This could be done using an iterative method in which we increase and decrease μ in increasingly smaller steps until we find the value that minimizes the sum of chi-squares or we can use the Solver function of Excel to find the minimum value of the cell containing the sum of chi-squares by changing the value of the cell containing μ.

Table 3 shows the counts, probabilities, and chi-squares for 443 residual phenanthrene measurements.

The mean calculated by (292*0 + 116*1 + 25*2 + 8*3 + 2*4) / 443 is 0.447. This gave a sum of chi-squares of 9.95. The P-value for a sum of 8 chi-squares was 0.127, indicating an acceptable fit. Although this is acceptable, Solver was used to find the minimum chi-square estimate of μ. The value of μ was 0.470, the sum of 8 chi-squares was 9.25, and the P-value was 0.160.

Since the chi-square calculation divides the squared difference by the expected frequency, there may be one or two very large chi-squares in the tail where the frequencies are small. This could lead to a rejection of the goodness of fit even though the distribution fits well for most of the measurements. If the fit looks acceptable for the majority, we can group all the last few observed and expected counts to produce a chi-square for the group. Since we lose degrees of freedom when we group to achieve a better fit, the smaller sum of chi-squares that results could still have a P-value that is less than 0.05. For the Phenanthrene measurements we can see large chi-squares for 3 and 4 µg/g. Grouping the observed and expected counts for 2 to 7 µg/g produced a chi-square of 0.12. The sum of the resulting three chi-squares was 1.28 and the P-value was 0.26.

Having fitted a distribution, the estimated upper specification limit is the value associated with a cumulative percentage of 99.9%. This is roughly equivalent to that of a μ + 3 * σ upper limit for a Normal distribution.

Based on these 443 residual phenanthrene measurements, the upper specification would be set at 4 μg/g.

When the histogram of measurements shows a long tail to the right with a few values as high as 10 or 12 μg/g, an exponential distribution may be an appropriate alternative to a Poisson.

The exponential distribution is given by:

In this equation, x denotes the observed measurement values, which must be greater than 0. The parameter, λ, is estimated by the mean of the measurements.

The exponential distribution is typically used to estimate the expected units of time that occur between consecutive occurrences of an event, for example the number of days between breakdowns of a machine. In our application of the distribution, we replace units of time with units of concentration in μg/g. An exponential distribution is fitted to the data in a similar way to the Poisson. In Excel, the function EXPONDIST(D87, L96, FALSE) is used to calculate the percentage in the interval centered on the value in cell D87 when the value of λ is in cell L96. Since the exponential is fitted to the mid-point of intervals, for example 1 is the center of 0.51 to 1.50, the value <1 is replaced by 0.3.