Statistical Essentials—Part 4: Regression and Design of Experiments - A well-designed experiment can make it easier to understand the sources of variation. - BioPharm International

ADVERTISEMENT

Statistical Essentials—Part 4: Regression and Design of Experiments
A well-designed experiment can make it easier to understand the sources of variation.


BioPharm International
Volume 21, Issue 11

Some hazards of regression analysis include extrapolating beyond the range of the x values; influential observations or outliers giving misleading models; and the regression of y = x is not the same as the regression of x = y.

A very popular statistic used to assess "goodness" of the model is R 2 . R 2 can be defined as the percent of the variation explained by the model. Unfortunately, R 2 is very sensitive to data distribution and spread. A high R 2 does not necessarily mean a good fit.

Anscombe created a data set that highlighted the importance of graphing data first, before applying any statistical test to the data set.1 In Figure 1, you can see how four different data sets, each with the same mean and standard deviation, can have very different data distributions, giving misleading regression coefficients. The graph of x1 versus y1 gives the appropriate regression line. The other graphs have a curvilinear relationship (x2 versus y2), a single influential point (x3 versus y3), or an outlier (x4 versus y4), leading to inappropriate conclusions.


Figure 1
We always consider the validity of the assumptions to be doubtful and conduct an analysis to examine the adequacy of the model. We cannot detect violations of the assumptions by examining summary statistics. A residual is defined as the difference between the observed value and the predicted value from the model. The standardized residuals have zero mean and unit variance (like a standard normal variable). A standardized residual outside 3 can be considered an outlier.


Figure 2
Figures 2, 3, and 4 show an example of how to apply regression analysis for assay data. A potency assay is to be validated for the range 50–150% potency. The protocol tested five different theoretical potency samples, three times on each of three days (nine total observations per potency level, for a total of 45 data points). We have done the analysis three different ways—three data points per day per potency level (Figure 2); one data point per day per potency level (Figure 3); and one data point per potency level (Figure 4)—to compare the results.


Figure 3
The same data summarized by days or concentration can give different R 2 values, while having no impact on the parameter estimates. Always use the raw data over summary data, unless the precision of the data (i.e., poor assay repeatability) warrants the use of the means.


blog comments powered by Disqus

ADVERTISEMENT

ADVERTISEMENT

AbbVie/Shire Deal Officially Off
October 20, 2014
Amgen Sues Sanofi and Regeneron over Patent for mAb Targeting PCSK9
October 20, 2014
EMA Works to Speed Up Ebola Treatment
October 20, 2014
Lilly to Close Manufacturing Facility in Puerto Rico
October 17, 2014
BioReliance Introduces New Predictive Assays
October 17, 2014
Author Guidelines
Source: BioPharm International,
Click here