Statistical Essentials—Part 4: Regression and Design of Experiments - A well-designed experiment can make it easier to understand the sources of variation. - BioPharm International
Statistical Essentials—Part 4: Regression and Design of Experiments
A well-designed experiment can make it easier to understand the sources of variation.
 Nov 1, 2008 BioPharm International Volume 21, Issue 11

Some hazards of regression analysis include extrapolating beyond the range of the x values; influential observations or outliers giving misleading models; and the regression of y = x is not the same as the regression of x = y.

A very popular statistic used to assess "goodness" of the model is R 2 . R 2 can be defined as the percent of the variation explained by the model. Unfortunately, R 2 is very sensitive to data distribution and spread. A high R 2 does not necessarily mean a good fit.

Anscombe created a data set that highlighted the importance of graphing data first, before applying any statistical test to the data set.1 In Figure 1, you can see how four different data sets, each with the same mean and standard deviation, can have very different data distributions, giving misleading regression coefficients. The graph of x1 versus y1 gives the appropriate regression line. The other graphs have a curvilinear relationship (x2 versus y2), a single influential point (x3 versus y3), or an outlier (x4 versus y4), leading to inappropriate conclusions.

 Figure 1
We always consider the validity of the assumptions to be doubtful and conduct an analysis to examine the adequacy of the model. We cannot detect violations of the assumptions by examining summary statistics. A residual is defined as the difference between the observed value and the predicted value from the model. The standardized residuals have zero mean and unit variance (like a standard normal variable). A standardized residual outside ±3 can be considered an outlier.

 Figure 2
Figures 2, 3, and 4 show an example of how to apply regression analysis for assay data. A potency assay is to be validated for the range 50–150% potency. The protocol tested five different theoretical potency samples, three times on each of three days (nine total observations per potency level, for a total of 45 data points). We have done the analysis three different ways—three data points per day per potency level (Figure 2); one data point per day per potency level (Figure 3); and one data point per potency level (Figure 4)—to compare the results.

 Figure 3
The same data summarized by days or concentration can give different R 2 values, while having no impact on the parameter estimates. Always use the raw data over summary data, unless the precision of the data (i.e., poor assay repeatability) warrants the use of the means.

 Bristol-Myers Squibb and Five Prime Therapeutics Collaborate on Development of Immunomodulator November 26, 2014
 Merck Enters into Licensing Agreement with NewLink for Investigational Ebola Vaccine November 25, 2014
 FDA Extends Review of Novartis' Investigational Compound for Multiple Myeloma November 25, 2014
 AstraZeneca Expands Biologics Manufacturing in Maryland November 25, 2014
 GSK Leads Big Pharma in Making Its Medicines Accessible November 24, 2014