## `rms` Package

### Purpose

• Make everyday statistical modeling easier to do
• Make modern statistical methods easy to incorporate into everyday work
• Make it easy to use the bootstrap to validate models
• Provide "model presentation graphics"

## Chapter 2: Why Regression?

### Regression Can Do ...

• Prediction, capitalizing on efficient estimation methods such as maximum likelihood and the predominant additivity in a variety of problems
• E.g.: effects of age, smoking, and air quality add to predict lung capacity
• When effects are predominantly additive, or when there aren't too many interactions and one knows the likely interacting variables in advance, regression can beat machine learning techniques that assume interaction effects are likely to be as strong as main effects
• Separate effects of variables (especially exposure and treatment)
• Hypothesis testing
• Deep understanding of uncertainties associated with all model components
• Simplest example: confidence interval for the slope of a predictor
• Confidence intervals for predicted values; simultaneous confidence intervals for a series of predicted values
• E.g.: confidence band for y over a series of x's

### Alternative: Stratification

• Cross-classify subjects on the basis of the Xs, estimate a property of Y for each stratum
• Only handles a small number of Xs
• Does not handle continuous X

### Alternative: Single Trees (recursive partitioning/CART)

• Interpretable because they are over-simplified and usually wrong
• Cannot separate effects
• Finds spurious interactions
• Require huge sample size
• Do not handle continuous X effectively; results in very heterogeneous nodes because of incomplete conditioning
• Tree structure is unstable so insights are fragile

### Alternative: Machine Learning

• E.g. random forests, bagging, boosting, support vector machines, neural networks
• Allows for high-order interactions and does not require pre-specification of interaction terms
• Almost automatic; can save analyst time and do the analysis in one step (long computing time)
• Uninterpretable black box
• Effects of individual predictors are not separable
• Interaction effects (e.g., differential treatment effect = precision medicine = personalized medicine) not available
• Because of not using prior information about dominance of additivity, can require 200 events per candidate predictor when Y is binary
• Logistic regression may require 20 events per candidate predictor
• Can create a demand for "big data" where additive statistical models can work on moderate-size data
Topic revision: r6 - 24 May 2016, FrankHarrell

• Biostatistics Webs

Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback