How Should Change be Measured?

  • Analysis of Paired Observations
    • Frequently one makes multiple observations on same experimental unit
    • Can't analyze as if independent
    • When two observations made on each unit (e.g., pre-post), it is common to summarize each pair using a measure of effect and then to analyze effects as if (unpaired) raw data
    • Most common: simple difference, ratio, percent change
    • Can't take effect measure for granted
    • Subjects having large initial values may have largest differences
    • Subjects having very small initial values may have largest post/pre ratios
  • What's Wrong with Percent Change?
    • First, we define percent change to be: % change = (first value - second value) / second value * 100
      • The first value is often called the new value and the second value is called the old value, but this does not fit all situations
      • Example:
        • Treatment A: 0.05 proportion having stroke
        • Treatment B: 0.09 proportion having stroke
      • The point of reference (which term is used in the denominator?) will impact the answer
        • Treatment A reduced proportion of stroke by 44%
        • Treatment B increased proportion by 80%
    • Two increases of 50% result in a total increase of 125%, not 100%
      • Math details: If $x$ is your original amount, two increase of 50% is $x\times 1.5\times 1.5$. Then, % change = $(1.5\times 1.5\times x - x) / x = x\times (1.5\times 1.5 - 1) / x = 1.25$, or a 125% increase
    • Percent change (or ratio) is not a symmetric measure
      • A 50% increase followed by a 50% decrease results in an overall decrease (not no change)
        • Example: 2 to 3 to 1.5
      • A 50% decrease followed by a 50% increase results in an overall decrease (not no change)
        • Example: 2 to 1 to 1.5
    • Simple difference or log ratio are symmetric
    • Unless percents represent proportions times 100, it is not appropriate to compute descriptive statistics (especially the mean) on percents. For example, the correct summary of a 100% increase and a 50% decrease, if they both started at the same point, would be 0%.
    • Analysis of % change has lower power than other methods
  • Objective Method for Choosing Effect Measure
    • Goal: Measure of effect should be as independent of baseline value as possible
    • Note: Because of regression to the mean, it may be impossible to make the measure of change truly independent of the initial value. A high initial value may be that way because of measurement error. The high value will cause the change to be less than it would have been had the initial value been measured without error. Plotting differences against averages rather than against initial values will help reduce the effect of regression to the mean.
    • Plot difference in pre and post values vs. the average of the pre and post values (Bland-Altman plot). If this shows no trend, the simple differences are adequate summaries of the effects, i.e., they are independent of initial measurements.
    • If a systematic pattern is observed, consider repeating the previous step after taking logs of both the pre and post values. If this removes any systematic relationship between the average and the difference in logs, summarize the data using logs, i.e., take the effect measure as the log ratio.
    • Other transformations may also need to be examined

Avoiding Change as a Response Variable in Parallel Designs

In a two-group parallel design, analysis of change is not recommended at all. The response variable should be the final measurement and the baseline measurement should be adjusted for as a covariate using analysis of covariance, with treatment assigned as one of the other variables. Besides the issues listed above, change scores are affected by regression to the mean. The slope of the baseline value may not be 1.0.

Summary of Reasons to Avoid Change Scores (Change from Baseline)

  • It is imperative to adjust for the baseline value anyway, using regression modeling (for reasons of bias reduction in observational studies and for maximizing power and precision in randomized trials).
  • Summary statistics computed on change scores strongly assume that the change measure is valid (see above). For example, computing the mean change from baseline of a response variable Y that has been transformed by a function f (which may be f(Y)=Y, i.e., no transformation needed) assumes that f(follow-up) - f(baseline) has constant variance across subjects and that it has no correlation with f(baseline) + f(follow-up). The recommended 3-number summary (quartiles) of the response can change arbitrarily if different transformations are used, because different transformations can reorder change scores across patients. On the other hand, quartiles of Y at follow-up are always valid and are never misleading, as the median of f(Y) is just f(median Y). Instead of plotting changes from baseline, plot responses over time with baseline values located at t=0.
  • Due to natural history of disease and regression to the mean caused by measurement error, patients change over time. Such changes are not interesting related to the goals of the study, and may result in misleading interpretations of changes from baseline in all treatment groups.
  • Using each patient as her own control through calculation of a change score is worse than using no control if the baseline is noisy. If the correlation between baseline and follow-up measurement is less than 0.5, subtracting the baseline is worse than just analyzing the follow-up measurement.
  • A noisy baseline cannot hurt analysis of covariance except for spending one degree of freedom from the error variance (with a small chance of variance inflation due to loss of orthogonality). A baseline that should be ignored will receive an appropriately small regression coefficient.
  • If the response variable Y is ordinal but not interval scaled, the difference from pre to post Y may not be ordinal. For example the change from pain level 1 to pain level 2 may not mean the same thing as changing from level 3 to level 4.
  • Many outcome variables are "tied down" such that there is a hard lower limit or hard upper limit, or both. This leads to floor and ceiling effects in change scores.
  • If the variable in question is used to screen patients for enrollment in the study, regression to the mean will be particularly strong and it is necessary to have a second baseline to get a meaningful change score.
Topic revision: r14 - 04 Jan 2017, FrankHarrell

This site is powered by FoswikiCopyright © 2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback