A Comparative Study of Statistical Methods to Assess Dilutional Similarity

October 1, 2005
BioPharm International, BioPharm International-10-01-2005, Volume 18, Issue 10
Page Number: 40–45

Departure from dilutional similarity can be interpreted as evidence that the groups of organisms are not comparable or the preparations do not contain the same active compounds.

Bioassays are an important component of an effective quality control program in biopharmaceutical development and manufacture. Proper statistical analysis can optimize interpretation of the information they provide. Two statistics are currently being used to assess dilutional similarity between the test and reference samples: the dilution effect (DE) and F-statistics. These two tests do not always agree. In this paper, we compare the two statistics in parallel logistic assay and assess the results (F-statistics are superior) and suggest implications for the pharmaceutical industry. Biological assays used to determine biopharmaceutical potency have come under increasing regulatory scrutiny. Potency is the ability of a material to exert its intended activity. During assay development, scientists rely on potency as the single most important parameter to confirm lot-to-lot consistency. Proper statistical assessment of the dilutional similarity between the test and reference samples can provide reliable estimates of bioassay potency.

Eloi P. Kpamegan, Ph.D., MSF


Often, the dilution effect measure confirms dilutional similarity of a test and reference sample even though the four parameter logistic curves are not parallel (intersect each other or don't have the same asymptotes). Recently, a full curve analysis was introduced to test dilutional similarity between the test sample and the reference standard using F-statistics.1

Emmens proposed that a logistic (semi-logarithmic) function might provide a good fit to the regression of quantitative response when the range of response is too great for a simple linear regression.2 Finney showed that under many but not all circumstances a four-or five parameter logistic will fit data well over a wide range of doses.3 The four parameter logistic equation is extensively used in bioassays due its similarity to equations used in various types of bioassays; it facilitates a uniform approach to problems of similar logical content. In addition to quality control of biopharmaceuticals, the use of non-linear logistics models such as a four-parameter logistic model is more common in clinical serology and pre-clinical evaluation of immunogenicity (including animal potency tests). According to Finney, the logistic equation has no theoretical pretensions but is simple to estimate.4

The four parameter logistic model of response Y values (e.g., counts/min, delta, optical density) versus the assay concentration (conc) is a standard model utilized in many immunological and biological assays. This model has several equivalent forms. The form used in this article is Equation (1):

where E(Y) is the expected response, a and d are the asymptotes, that is to say: E(Y)d when conc → ∞ and E(Y) a when conc → 0; b is the "slope" or the shape parameter. The variable xmid (also called EC50 or IC50 in the literature) is the dose at which 50% of the maximal response is observed. All symbols are also listed in a separate box .


The statistics used to compare sample and reference curves are not uniform across the industry. A statistic called dilution effect was introduced in the industry to assess dilution similarity.5,6 The dilution effect is a measure of the percent bias per 2-fold dilution in a test sample's value relative to that of the reference standard. It is the apparent change in a sample's dilution-adjusted concentration when it is diluted 2-fold.2,5 The dilution effect is calculated by Equation (2):

The estimated slopes of the test sample and reference standard respectively are obtained by fitting Equation (3):

Z =1 for the reference sample and Z = 0 for the test sample; xmidR is the EC50 of the reference sample and xmidS is the EC50 of the test sample. Perfect parallelism corresponds to 0% dilution effect. The absolute value of dilution effect less than 20% has been used in the industry to conclude dilutional similarity (parallelism) between the test sample and the reference standard.


Dilutional similarity means that the reference and the test samples have common a, d, and b parameters. Thus, failure to share common a, d, and b parameters implies a failure of dilutional similarity. The DE statistic checks this only at one point (xmid) while the F-statistic checks the concentration range.

A standard F statistic for testing for parallelism (dilutional similarity) is obtained in the following way:

•Under the null hypothesis of parallel assays, use Equation (4):

•Fit equation 4 to the data to obtain the sum of squared errors (SSE), denoted as SSE ( Parallel).

•Under the alternative hypothesis that the curves are not parallel, i.e., asymptotes and slopes are not the same, use Equation (5):

where the subscripts R and S denote parameters for the reference and sample logistics models, respectively. Z = 0 or 1 as shown earlier. Obtain SSE(Nonparallel) by fitting Equation (5) to the data.

Compute the F statistic for parallelism with Equation (6):

Using 95% confidence level, the assumption of parallelism will be rejected if F is larger than the critical point from an F distribution with 3 numerator degrees of freedom and (n–8) denominator degrees of freedom, where n is the total number of observations in the analysis.


As part of the release of a drug product as a reference standard, a qualification study was undertaken to compare the potency between the new formulation with the original reference standard, STD. Figures 1– 5 are graphs comparing a test sample to a reference standard using a 4-parameter logistic model. These are independent tests of the formulations and not a sequence.

Figure 1. Total RNA Yield (mg) from Different Tissues (mg) The upper asymptotes do not match. Dilutional similarity rejected by F-test.

The compounds were radio-tagged, depicting the measurement in counts/min. The starting point of each test is 200 mg/mL and then it is diluted in steps of 3-fold or 2-fold until the last measurement at 0.77 mg/mL. On all figures the x-axis is logarithmic and the y-axis is linear.

Figure 2. Yield of Total and mRNA from Different Tissues and Cells Divergence at higher concentrations. Dilutional similarity rejected by F-test.

In order to compute the potency, the test sample has to be parallel to the reference standard. In Figure 1, the test sample and the reference standard do not have the same upper asymptote. The F-statistic was 3.67 with p-value = 0.0436 < 0.05 rejecting the dilutional similarity. However, the dilution effect computed was 5.95%, which did not reject the parallelism because it is ≤ 20%. In Figure 2 and Figure 3, the test of parallelism was rejected using F-statistics. The p-values were < 0.0001 and 0.0004, respectively. The dilution effects statistics computed were 16.89% and 19.31%, respectively, yielding an acceptance result for dilutional similarity, as it was ≤ 20%.

In Figure 4, the test of parallelism yielded a p-value of 0.992, which is ≥ 0.05 and the dilution effect measure computed was less than 20%. The two tests had comparable outcomes. In Figure 5, the two tests rejected the dilutional similarity.

These results demonstrated that the DE statistics did not assess accurately the parallelism between the test sample and the reference standard in that there were questionable acceptances.

Figure 3. Intact vs. Degraded RNA Divergence at most concentrations. Dilutional similarity rejected by F-test. Barely accepted by DE test.


The DE measure assesses the parallelism at the linear part of the standard curve. It is a practical measurement that might be a solution for the differences in matrix, such as different initial concentrations tested and different sets of concentrations of references and test samples tested in the assay. However, fitting a logistic response and using only the linear part might lead to a biased estimate during assay development. The use of DE measure in parallel-line assay with the appropriate test design might be more appropriate.

Figure 4. Agilent 2100 Bioanalyzer Data of a High Quality, Eukaryotic, Total RNA Sample Dilutional similarity accepted.

F-statistics are very sensitive to the differences in matrix and should not be used when the reference and test sample are not tested with the same concentrations. Set the acceptance criteria for parallelism according to the stage during test development. The critical value is determined by the alpha level, which is the rejection rate assuming parallelism. Typical values for alpha are 5%, 1% or 0.1%. In practice, assays are rejected too frequently for nonparallelism when alpha is 5%, but it is difficult to see the nonparallelism. One could set the alpha level to 1% or 0.1% at the beginning of the test development and increase alpha to 5% after all the parameters are under control.

Figure 5. Different Types of Genomic DNA Contamination In Total RNA Preparations Both tests reject dilutional similarity.

Departure from dilutional similarity can be interpreted as evidence that the groups of organisms are not comparable or the preparations do not contain the same active compound. Failure of the parallelism test is often the first evidence that a product is not stable. F- statistics can be used for full curves analysis to test for parallelism (but we recommend you test that the asymptotes have not changed).


Use DE with caution. As DE uses only the slopes of the test sample and reference standard, conduct a data transformation and a parallel line analysis.

Eloi P.Kpamegan,Ph.D.,MSF, Manager Biostatistics Unit, Global Clinical Immunology, North American Analytical Science and Assay Development Sanofi Pasteur, One Discovery Drive, Swiftwater, PA 18370 570.839.4659,Fax:570.895.2972, eloi.kpamegan@sanofipasteur.com


1. Reeve R. Two statistical methods for estimating relative potency of bioassays. Biopharm Int. 2000 July; 13(7):54-60.

2. Emmens CW. The dose response relationship for certain principles of the pituitary gland, and of the serum and urine of pregnancy. Journal of Endocr. 1940; 2:194-225.

3. Finney DJ. Response curves for radioimmunoassay. Clinical Chemistry 1983; 29:1562-1566.

4. Finney DJ. Statistical methods in biological assay, 3rd ed. London: Griffin; 1978.

5. Klein J, Capen R, Mancinelli R, Robinett R, Pietrobon FPJ, Quinn J, Schofield T. Validation of assays for use with combination vaccines, Biologicals 1999; 27:35-41.

6. Schofield T. Assay Validation. In: Encyclopedia of Biopharmaceutical Statistics. New York: Marcel Dekker 2000; p. 21–30.