With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis
Information and Updates to the Book | Regression Modeling Strategies Package: rms for R
Second Edition
- REGRESSION MODELING STRATEGIES with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis by FE Harrell
- E-book available as of 2015-08-17 here
- Print version available 2015-09-04 | Flyer
- You may order the hardcover or e-book here or from Amazon
- Changes from the first edition
- ISBN 978-3-319-19424-0
- Errata
- R code for all examples in the book's 2nd edition. Numbers in file names are chapter numbers.
- Alternate R Code
- Reviews
First Edition
- REGRESSION MODELING STRATEGIES with Applications to Linear Models, Logistic Regression, and Survival Analysis by FE Harrell. The book was published June 5 2001 by Springer New York, ISBN 0-387-95232-2 (also available at amazon.com and DirectTextBook. Click here to see the text from the book's back cover. Click here to see the preface and table of contents for the book manuscript in .pdf format. Click here to obtain a partial index to the book in .pdf format, and here to see a sample chapter from the book ( Note:This material is Copyright 2001-2004 Springer-Verlag and may not be reproduced).
- Changes and additions for the second edition (projected publication July 2015)
- Reviews of the book:
- Statistical Methods in Biomedical Research
- Biometrics 58:477, June 2002
- Bulletin of the Swiss Statistical Society 43:11 (to appear also in Statistical Methods in Medical Research)
- International Journal of Epidemiology 31(3):699-700, June 2002. Note: This otherwise excellent review states that the book recommends selecting variables to include in the model on the basis of their frequency of selection by a bootstrap procedure. This is definitely not the case.
- Journal of the American Statistical Association 98:257-258, March 2003
- Medical Decision Making, 23(2):182-183, April 2003
- Technometrics 45:170, May 2003
- Statistics in Medicine 22:2531-2532, 15 Aug 2003
- Reviews at Springer
- Clinical Chemistry
- Errata for the first and second and later printings. The book had its third printing in December 2002 and its fourth printing in December 2003. The sixth printing was in December 2005.
- New versions of R code that makes some examples in the book relying on the
Design
package to work with the rms
package
- One-semester course using part of the text, for students who have not had a course in linear regression.
- Interactive Overview of many of the methods in the book
- Syllabus for a more advanced course using the text up until survival analysis, for students already versed in ordinary linear regression.
Short Courses
- Vanderbilt University Campus: 14-18 May 2018; details here
- Click here for a detailed course description
- Click here for supplements to handouts
- Offered for the first time in the Vanderbilt University Department of Biostatistics graduate program Spring 2013 (Jan-Apr). It is taught yearly by Prof. Harrell
Materials
- See CourseBios330 for up-to-date material
- Handouts
- R code for all examples in the book's 2nd edition. Numbers in file names are chapter numbers.
- Chapter 7 from the first edition: Case Study in Least Squares Fitting and Interpretation of a Linear Model (analysis of the 1992 US presidential election)
- Survey of new approaches to regression and tree-based modeling (referred to in Chapter 4 of the second edition)
- Syllabus for a 1-day short course based on the text
- Syllabus for a 3-day short course and S workshop based on the text
- Syllabus for a 1-day short course "Modern Approaches to Predictive Modeling and Covariable Adjustment in Randomized Clinical Trials"
- Scripts developed in class during the May 2000 or August 2000 3-day courses or the June 2001 or June 2002 3-day course for Insightful Corporation
Discussion Board
Datasets
Additional Problems for Students
- Homework assignments not in the book. Some of the early assignments are for basic regression, a prerequisite for the book. Some of the problems use data in Rosner B: Fundamentals of Biostatistics, 5th Edition. Belmont CA: Duxbury Press; 1999. Solutions to these problems as well as solutions to many of the problems given in the book are available to instructors by E-mailing the author
Quizzes
- Quizzes (with answer sheets) on concepts in the text and on prerequisites, are available to instructors by E-mailing the author
Software
- Interactive S scripts demonstrating various curve fitting criteria and showing the flexibility of restricted cubic splines (see also here)
- An Introduction to S and the Hmisc and Design Libraries by CF Alzola and FE Harrell
- Statistical computing course material
- Miscellaneous S software available on the Internet that is related to some of the methods covered in this book such as data reduction, censored data analysis, imputation, recursive partitioning
- Unsupported SAS macros for restricted cubic splines, displaying survival estimates, and checking proportional hazards and other model assumptions, etc. (FE Harrell, 1991). Click here for information, examples, and brief information on SAS procedures useful for multivariable modeling and on obtaining predicted values with SAS.
- SAS macros for various censored data calculations such as AUC, from Chambless LE, Diao G. Estimation of time-dependent area under the ROC curve for long-term risk prediction. Stat Med 2006 Oct 30; 25 (20):3474-86.
- Warren Sarle's SAS macros and examples for bootstrapping and jackknifing. See Warren's cautionary note on bootstrap confidence intervals, with a good example related to R^2 in multiple regression. The example shows that when the estimate of R^2 is badly biased, bootstrap confidence limits are badly displaced to the right. Included in the notes is the standard error of R^2 and information about adjusted R^2.
- StatLib statistical computing repository
- The penalized package in R
Studies of Methods Used in the Text
- Recent simulation experiments conducted by Carl Moons and Frank Harrell indicate that the performance of
transcan
for multiple imputation is about halfway between single conditional mean imputation and MICE (see below), consistent with the findings from Faris PD, Ghali WA, et al (2002): Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. J Clin Epidemiology 55:184-191. Suboptimal performance of transcan
for multiple imputation is probably due to the fact that transcan
fits the flexible additive imputation models and then draws all multiple imputations from the fitted models. A new function in the Hmisc package, aregImpute
, uses the bootstrap to re-fit additive nonparametric imputation models for each of the multiple imputations. Results for aregImpute
are very promising (see below).
- Validation of binary logistic models
- Simulation studies
- Steyerberg EW, Harrell FE, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF (2001): Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology 54:774-781.
- Steyerberg EW et al (2003): Internal and external validation of predictive models: A simulation study of bias and precision in small samples. Journal of Clinical Epidemiology 56:441-447.
- Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF (2005): Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. Journal of Clinical Epidemiology 58:475-483.
- Studying the degrees of freedom spending strategy that uses generalized Spearman rho^2, in terms of preserving type I error and sigma^2 in ordinary least squares
- Prediction Error in Cox Models Varying Number of Predictors
- Shrinkage and problems with stepwise variable selection: See Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JDF (2001): Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Medical Decision Making 21:45-56.
- Model simplification and stepwise variable selection: See Ambler G, Brady AR, Royston P (2002): Simplifying a prognostic model: a simulation study based on clinical data. Statistics in Medicine 21:3803-3822. The authors studied the performance of the model simplification strategy discussed in the book, and compared it with more traditional variable selection methods, finding that standard variable selection can work well when there is a large proportion of irrelevant variables.
- New case study on penalized maximum likelihood estimation for binary logistic modeling: Moons KGM, Donders ART, Steyerberg EW, Harrell FE (2004): Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clinical Epidemiology 57:1262-1270.
- Choosing the penalty
- Example of a spline interaction surface
- Interactive demonstrations of curve fitting, effects of categorization, etc.
- Peter Ellis' blog article about overanalysis of time series data
R-help
Multiple Imputation
- Joseph Schafer's Multiple Imputation FAQ
- Napier University's imputation page
- Multiple Imputation Online and R MICE Software by Stef van Buuren and Karin Oudshoorn
- To subscribe to the Impute E-mail discussion group led by Juned Siddique of Northwestern University, click here.
- A paper containing a good overview of multiple imputation and a comparison of some software packages is Horton NJ, Lipsitz SR, The American Statistician 55:244-254; 2001.
- An excellent recent survey of missing data methods is Schafer, JL and Graham JW, Psychological Methods 7:147-177; 2002.
- See also Biases in SPSS 12.0 Missing Value Analysis by Paul von Hippel, The American Statistician 58:160-164; 2004.
- Notes from Tim Hesterberg on why the response variable must be used when doing multiple imputation. Tim's notes include code to do several simulations illustrating his points.
- A nice study and review of multiple imputation
- Comparisons of aregImputewith other imputation algorithms
- Moons KGM, Donders RART, Stijnen T, Harrell FE, J Clinical Epidemiology 59:1092-1101; 2006.
- Horton NJ, Kleinman KP, The American Statistician 61:79-90; 2007.
General Statistical Information
- General information on regression modeling, including prerequisite material for Regression Modeling Strategies
- Problems with categorizing continuous variables
- Information for biomedical researchers and medical citations for statistical issues
- Notes on regression modeling in randomized clinical trials
- Julian Faraway's free book Practical Regression and Anova using R
- Bender and Benner's paper Calculating ordinal regression models in SAS and S-Plus
- Stephan Rudolfer's presentation Diagnosis of Carpal Tunnel Syndrome using Logistic Regression, an excellent presentation on various types of ordinal logistic models. Includes a nice discussion of accuracy indexes.
- John Fox's Applications of Quantitative Methods in Sociology course material, including information on polytomous logistic regression
- John Fox's excellent article on Bootstrapping Regression Models
- Brian Ripley's terrific presentation Selecting Amongst Large Classes of Models, containing highly useful thoughts about AIC, cross-validation, and other concepts
- Paul Allisons excellent discussion about R^2 measures
- Lindsay Smith's nice tutorial on principal components analysis
- Glossary of Statistical Terms
- Annotated bibliography with emphasis on predictive methods, survival analysis, logistic regression, prognosis, diagnosis, modeling strategies, model validation, practical Bayesian methods, clinical trials, graphical methods, papers for teaching statistical methods, bootstrap, etc; FE Harrell * Miscellaneous information on methodology, much of it culled from electronic discussion groups (
.zip
file)
- Bob Obenchain's Regression Shrinkage Web page
- Patrick Burns' bootstrap and resampling page
- How to read output from the
rms
package nomogram
function
- Jan de Leeuw's excellent working paper on splines, including monotone splines
- Statistical Computing home page
- Presentations
- Teaching Material
--
FrankHarrell - 30 Jan 2004; updated 29 Feb, 4,27 May, 4, 11 Jul, 29 Aug, 2 Sep 2004, 20, 22 Jan, 24 Mar, 14 Aug 2005, 13 Jan, 11 Nov 2006, 28 Feb, 29 Jun, 12 Jul, 16 Jul, 7 Sep, 22 Oct 2007, 11 Jan, 2 Sep, 18, 23 Dec 2008, 6 Jan, 3 Feb, 13, 17 Apr, 9 Sep 2009, 9Feb 2010, 21 Feb 2011, 21 Apr 2011, 1 May 2011, 29 May 2011, 6 Aug, 5 Sep 2011, 22 Dec 2012, 1 Mar 2014, 12 Apr 2014, 2014-07-15, 2014-09-23, 2014-10-04, 2014-11-26, 2015-01-03, 2015-01-23, 2015-03-22, 2015-06-08, 2015-08-27, 2015-09-06, 2015-09-23, 2016-11-10, 2017-04-16