The Poisson percentages can be calculated in Excel using the formula "=POISSON(E118,F134,FALSE)." The cell reference E118
contains an integer μg/g value and F134 the parameter, μ. Excel truncates any values of X that are not integers.
The P-value of the chi-square test can be calculated in Excel by "= CHIDIST(M15,M16)." Cell M15 contains the sum of chi squares
and M16 contains the degrees of freedom. The degrees of freedom are two less than the number of chi-square values in the sum.
If the P-value is larger than 0.05, the Poisson distribution is a good fit to the data.
If the distribution is not a good fit, estimating the value of μ that minimizes the sum of the chi-squares can produce a better
fitting distribution. This could be done using an iterative method in which we increase and decrease μ in increasingly smaller
steps until we find the value that minimizes the sum of chi-squares or we can use the Solver function of Excel to find the
minimum value of the cell containing the sum of chi-squares by changing the value of the cell containing μ.
Table 3 shows the counts, probabilities, and chi-squares for 443 residual phenanthrene measurements.
The mean calculated by (292*0 + 116*1 + 25*2 + 8*3 + 2*4) / 443 is 0.447. This gave a sum of chi-squares of 9.95. The P-value
for a sum of 8 chi-squares was 0.127, indicating an acceptable fit. Although this is acceptable, Solver was used to find the
minimum chi-square estimate of μ. The value of μ was 0.470, the sum of 8 chi-squares was 9.25, and the P-value was 0.160.
Since the chi-square calculation divides the squared difference by the expected frequency, there may be one or two very large
chi-squares in the tail where the frequencies are small. This could lead to a rejection of the goodness of fit even though
the distribution fits well for most of the measurements. If the fit looks acceptable for the majority, we can group all the
last few observed and expected counts to produce a chi-square for the group. Since we lose degrees of freedom when we group
to achieve a better fit, the smaller sum of chi-squares that results could still have a P-value that is less than 0.05. For
the Phenanthrene measurements we can see large chi-squares for 3 and 4 µg/g. Grouping the observed and expected counts for
2 to 7 µg/g produced a chi-square of 0.12. The sum of the resulting three chi-squares was 1.28 and the P-value was 0.26.
Having fitted a distribution, the estimated upper specification limit is the value associated with a cumulative percentage
of 99.9%. This is roughly equivalent to that of a μ + 3 * σ upper limit for a Normal distribution.
Based on these 443 residual phenanthrene measurements, the upper specification would be set at 4 μg/g.
When the histogram of measurements shows a long tail to the right with a few values as high as 10 or 12 μg/g, an exponential
distribution may be an appropriate alternative to a Poisson.
The exponential distribution is given by:
In this equation, x denotes the observed measurement values, which must be greater than 0. The parameter, λ, is estimated
by the mean of the measurements.
The exponential distribution is typically used to estimate the expected units of time that occur between consecutive occurrences
of an event, for example the number of days between breakdowns of a machine. In our application of the distribution, we replace
units of time with units of concentration in μg/g. An exponential distribution is fitted to the data in a similar way to the
Poisson. In Excel, the function EXPONDIST(D87, L96, FALSE) is used to calculate the percentage in the interval centered on
the value in cell D87 when the value of λ is in cell L96. Since the exponential is fitted to the mid-point of intervals, for
example 1 is the center of 0.51 to 1.50, the value <1 is replaced by 0.3.