Steven Walfish

This article is the first in a fourpart series on essential statistical techniques for any scientist or engineer working
in the biotechnology field. Though many of the techniques seem trivial, they are often misused or misunderstood in practice.
Future topics will include hypothesis testing, confidence intervals, Design of Experiments, and analysis of variance. This
column deals with the ways data can be presented to maximize effectiveness, including methods to summarize data sets.
DATA TYPES
There are two main types of data: quantitative and qualitative. Quantitative, or numerical, data are continuous data sets
with an infinite number of possible values. For example, protein concentration is considered continuous data, because its
value is limited by the sensitivity of the measurement device. Qualitative, or categorical, data have a finite number of possible
values. For example, the number of defective vials in a lot is considered qualitative because its values range from zero to
the number of vials in the lot incremented in whole units.
Each of the data types has different statistical methods used for summarizing and reporting.
QUALITATIVE DATA
Figure 1. A example of a bar graph

Qualitative data are usually presented in tabular format or as a percentage. Graphically, a bar graph can be used to present
the data. A bar graph can be used to present a single variable or to compare two or more variables. Figure 1 and Table 1 show
a presentation of the reasons for a lot failure.
Table 1. The reasons for a lot failure

It is not enough just to plot the data to compare the frequencies by year. If the number of lots each year is different, it
is preferred to plot the percentages to make a better comparison. Figure 2 shows the preferred graph.
Figure 2. A preferred bar graph for comparing data from two years. The table below the bar graph makes it clear that the
number of lots was different.

The comparative graph in Figure 1 does not show the vast improvement between 2007 and 2006 that can be seen in Figure 2. Additional
statistical tests such as the chisquared test can be used to show that 2007 showed a statistically better percent defective
when compared to 2006 (p = 0.005).