Protocol Development Workshop
Warren Capell, MD
Session 5 – Data Analysis Plan
Copyright ©2016 The Regents of the University of Colorado
It is important that your protocol communicate an appropriate statistical approach to analyze your outcomes. In
reviewing your data analysis plan, the IRB needs to know that there is a plan that will provide some kind of valid
generalizable knowledge (the main benefit of most research studies). An expert analysis plan is always preferred
and communicates that this study is well-positioned to meet its goals; however, even a rudimentary analysis plan
can be sufficient to meet this generalizable knowledge standard. Unfortunately, many protocols are submitted
with analysis plans that are either nonsensical, or communicate ignorance toward how to begin analyzing the
project. This section is intended to help you talk intelligently about your data structure and communicate the
broad type(s) of statistical tests you will employ.
A. Descriptive Statistics
If the purpose of your project is only to describe a phenomenon or condition you are measuring (e.g.,
number of positive tests, distribution of head circumference, prevalence of trait X, percentage of patients
responding to a single treatment), you will be using “descriptive statistics.” Descriptive statistics include
counts, percentages, means, medians, modes, proportions, standard deviations, variances, frequencies, and
histograms, among others. The key is that there are no comparisons being made between different
measures in the study; the purpose is to report the measures only. In this case, your plan can be stated as
(for example) “We will use descriptive statistics to report on the median survival in X malignancy.” Perhaps
part of your study purpose is to describe; the plan for that portion of the study will then utilize descriptive
statistics.
If the goal of your study is to make comparisons (i.e., test hypotheses), descriptive statistics are not
sufficient. Proceed to section II,B.
B. Hypothesis Testing: recognize your variables
If the goal of your study is to make comparisons between groups, arms, or conditions, you will need to
choose appropriate statistical tests of your hypotheses. To choose tests appropriately, you need to
understand your data structure. Data structure depends on the types of variables by which your predictor
conditions and outcome conditions are measured.
i. Categorical variables. Categorical variables are categories that each subject or measurement is
assigned. Categorical variables do not have any intrinsic numeric value; they are essentially labels.
Examples include color (red, blue, yellow, etc.), treatment arm (placebo, low-dose drug, high dose drug,
etc.), presence of condition (e.g., yes/no, or positive/negative), vital status (dead or alive), diagnosis
(schizophrenia, major depression, delirium, etc.). Note that since categorical variables are not numeric,
they do not have a mean, median, or distribution/variance. Either the subject meets the category or
s/he does not. Typically, counts (or proportions) of subjects in the sample that fit each category make
up the study results.
ii. Interval variables. Interval variables have an inherent numerical value. When measured, each subject
will have a particular numerical value. The numerical scale for the variable is such that the difference
between any two numbers on the scale is a constant difference (i.e., the difference between 2 and 5, is
the same as between 102 and 105). Because of their numeric value, a mean, median, and variance for
the study sample (or study arm) can be calculated once each subject’s result is measured. Interval