The Poisson percentages can be calculated in Excel using the formula "=POISSON(E118,F134,FALSE)." The cell reference E118
contains an integer μg/g value and F134 the parameter, μ. Excel truncates any values of X that are not integers.
The Pvalue of the chisquare test can be calculated in Excel by "= CHIDIST(M15,M16)." Cell M15 contains the sum of chi squares
and M16 contains the degrees of freedom. The degrees of freedom are two less than the number of chisquare values in the sum.
If the Pvalue is larger than 0.05, the Poisson distribution is a good fit to the data.
If the distribution is not a good fit, estimating the value of μ that minimizes the sum of the chisquares can produce a better
fitting distribution. This could be done using an iterative method in which we increase and decrease μ in increasingly smaller
steps until we find the value that minimizes the sum of chisquares or we can use the Solver function of Excel to find the
minimum value of the cell containing the sum of chisquares by changing the value of the cell containing μ.
Table 3 shows the counts, probabilities, and chisquares for 443 residual phenanthrene measurements.
The mean calculated by (292*0 + 116*1 + 25*2 + 8*3 + 2*4) / 443 is 0.447. This gave a sum of chisquares of 9.95. The Pvalue
for a sum of 8 chisquares was 0.127, indicating an acceptable fit. Although this is acceptable, Solver was used to find the
minimum chisquare estimate of μ. The value of μ was 0.470, the sum of 8 chisquares was 9.25, and the Pvalue was 0.160.
Since the chisquare calculation divides the squared difference by the expected frequency, there may be one or two very large
chisquares in the tail where the frequencies are small. This could lead to a rejection of the goodness of fit even though
the distribution fits well for most of the measurements. If the fit looks acceptable for the majority, we can group all the
last few observed and expected counts to produce a chisquare for the group. Since we lose degrees of freedom when we group
to achieve a better fit, the smaller sum of chisquares that results could still have a Pvalue that is less than 0.05. For
the Phenanthrene measurements we can see large chisquares for 3 and 4 µg/g. Grouping the observed and expected counts for
2 to 7 µg/g produced a chisquare of 0.12. The sum of the resulting three chisquares was 1.28 and the Pvalue was 0.26.
Having fitted a distribution, the estimated upper specification limit is the value associated with a cumulative percentage
of 99.9%. This is roughly equivalent to that of a μ + 3 * σ upper limit for a Normal distribution.
Based on these 443 residual phenanthrene measurements, the upper specification would be set at 4 μg/g.
When the histogram of measurements shows a long tail to the right with a few values as high as 10 or 12 μg/g, an exponential
distribution may be an appropriate alternative to a Poisson.
The exponential distribution is given by:
In this equation, x denotes the observed measurement values, which must be greater than 0. The parameter, λ, is estimated
by the mean of the measurements.
The exponential distribution is typically used to estimate the expected units of time that occur between consecutive occurrences
of an event, for example the number of days between breakdowns of a machine. In our application of the distribution, we replace
units of time with units of concentration in μg/g. An exponential distribution is fitted to the data in a similar way to the
Poisson. In Excel, the function EXPONDIST(D87, L96, FALSE) is used to calculate the percentage in the interval centered on
the value in cell D87 when the value of λ is in cell L96. Since the exponential is fitted to the midpoint of intervals, for
example 1 is the center of 0.51 to 1.50, the value <1 is replaced by 0.3.
