REGRESSION MODELING STRATEGIES
With Applications to Linear Models, Logistic Regression, and Survival Analysis
Information and Updates to the Book
- REGRESSION MODELING STRATEGIES with Applications to Linear Models, Logistic Regression, and Survival Analysis by FE Harrell. The book was published June 5 2001 by Springer New York, ISBN 0-387-95232-2 (also available at amazon.com, Booksamillion.com, DirectTextBook, StudentMarket.com, or FetchBook.Info). Click here to see the text from the book's back cover. Click here to see the preface and table of contents for the book manuscript in .pdf format. Click here to obtain a partial index to the book in .pdf format, and here to see a sample chapter from the book (Note:This material is Copyright 2001-2004 Springer-Verlag and may not be reproduced).
- Changes and additions planned for the second edition (projected publication date second quarter 2010)
- Reviews of the book:
- Statistical Methods in Biomedical Research
- Biometrics 58:477, June 2002
- Bulletin of the Swiss Statistical Society 43:11 (to appear also in Statistical Methods in Medical Research)
- International Journal of Epidemiology 31(3):699-700, June 2002. Note: This otherwise excellent review states that the book recommends selecting variables to include in the model on the basis of their frequency of selection by a bootstrap procedure. This is definitely not the case.
- Journal of the American Statistical Association 98:257-258, March 2003
- Medical Decision Making, 23(2):182-183, April 2003
- Technometrics 45:170, May 2003
- Statistics in Medicine 22:2531-2532, 15 Aug 2003
- Reviews at Springer
- Errata for the first and second and later printings. The book had its third printing in December 2002 and its fourth printing in December 2003. The sixth printing was in December 2005.
- Syllabus for a one-semester course using part of the text, for students who have not had a course in linear regression.
- Interactive Overview of many of the methods in the book
- Syllabus for a more advanced course using the text up until survival analysis, for students already versed in ordinary linear regression.
Short Courses
2009 Upcoming short courses in Regression Modeling Strategies
-
1/2 Day Short Course, useR! 2009 International R Users Conference, Rennes, France, July 2009
-
Webinar on Parametric Survival Modeling for the ASA Biopharmaceutical Section, 3 April 2009
-
One-day Short Course, ENAR (Eastern North American Region of the Biometrics Society), San Antonio, Texas, 15Mar09
- Five-Session Short Course by Frank E. Harrell, Jr., Ph.D., Professor and Chair, Department of Biostatistics, Vanderbilt University School of Medicine.
- Date: February 2 - 6, 2009
- Requirements: strong competence in multiple regression models.
- Target audience: statisticians and related quantitative researchers who want to learn some general model development strategies, including approaches to missing data imputation, data reduction, model validation, and relaxing linearity assumptions.
- Registration: please email Eve Anderson.
- Course fee:
VU and MMC Students and Post-docs $50
VU and MMC Faculty and Staff $200
Other Students $200
Other Members of Non-Profit Institutions $300
Members of For-Profit Institutions $600
No charge to Department of Biostatistics faculty/staff
- Payment method: cash, check, VUMC form 1180, Visa and MasterCard.
- Book: attendees may purchase a copy of Dr. Harrell's book by following the links above to purchase sites.
- Registration deadline: EXTENDED January 23, 2009.
- Cancellation policy: TBA
- Handouts: see below; printed copies will be provided for paying attendees.
| DAY |
DATE |
TIME |
LOCATION |
| Monday |
February 2, 2009 |
8:30AM/11:30AM |
Student Life Center, LL Mtg Rooms 1&2 |
| Tuesday |
February 3, 2009 |
8:30AM/11:30AM |
Student Life Center, LL Mtg Rooms 1&2 |
| Wednesday |
February 4, 2009 |
8:30AM/11:00AM |
Student Life Center, LL Mtg Rooms 1&2 |
| Thursday |
February 5, 2009 |
8:30AM/11:30AM |
MRBIII, Room 3131 |
| Friday |
February 6, 2009 |
8:30AM/11:30AM |
Student Life Center, LL Mtg Rooms 1&2 |
Extended Discussions: Tuesday Feb. 3 and Friday Feb. 6 1:15pm-3:00pm, location: D-2221 Medical Center North.
This is a good time to bring up questions that are very specific to your area of research.
Materials
- Handouts - be sure to print two pages per side of paper
- Syllabus for a 1-day short course based on the text
- Syllabus for a 3-day short course and S workshop based on the text
- Syllabus for a 1-day short course "Modern Approaches to Predictive Modeling and Covariable Adjustment in Randomized Clinical Trials"
- Scripts developed in class during the May 2000 or August 2000 3-day courses or the June 2001 or June 2002 3-day course for Insightful Corporation
- Course Evaluations
Discussion Board
- A discussion board for readers and the author to discuss questions, issues, controversies, and new research related to the text
Datasets
Additional Problems for Students
- Homework assignments not in the book. Some of the early assignments are for basic regression, a prerequisite for the book. Some of the problems use data in Rosner B: Fundamentals of Biostatistics, 5th Edition. Belmont CA: Duxbury Press; 1999. Solutions to these problems as well as solutions to many of the problems given in the book are available to instructors by E-mailing the author
Quizzes
- Quizzes (with answer sheets) on concepts in the text and on prerequisites, are available to instructors by E-mailing the author
Software
- Statistical Computing Tools page for downloading Main.Hmisc and Main.Design
-
R and the Hmisc and Design libraries: There is a freely available software system that has most of the features of S-PLUS called R. The Main.Hmisc library was made available for R 7Apr02 and the Main.Design library for R was made available on 20May02. Both libraries work on Linux, Unix, MacOSX, and Windows versions of R. In July 2003 both libraries were placed on CRAN, the Comprehensive R Archive Network, so they may be downloaded and installed or updated using simple R commands (or menus in Windows).
- Overview of the
Hmisc Library
- Overview of the
Design Library
- Reference card for the
Design Library. If using US letter size paper, print the first page then insert it back into the printer to print the second page on back. Fold and cut into a booklet.
- Online documentation for the Main.Hmisc and Main.Design Libraries.
- Interactive S scripts demonstrating various curve fitting criteria and showing the flexibility of restricted cubic splines
- An Introduction to S and the Main.Hmisc and Main.Design Libraries by CF Alzola and FE Harrell
- Statistical computing course material
- Miscellaneous S software available on the Internet that is related to some of the methods covered in this book such as data reduction, censored data analysis, imputation, recursive partitioning
- Unsupported SAS macros for restricted cubic splines, displaying survival estimates, and checking proportional hazards and other model assumptions, etc. (FE Harrell, 1991). Click here for information, examples, and brief information on SAS procedures useful for multivariable modeling and on obtaining predicted values with SAS.
- SAS macros for various censored data calculations such as AUC, from Chambless LE, Diao G. Estimation of time-dependent area under the ROC curve for long-term risk prediction. Stat Med 2006 Oct 30; 25 (20):3474-86.
- Warren Sarle's SAS macros and examples for bootstrapping and jackknifing. See Warren's cautionary note on bootstrap confidence intervals, with a good example related to R^2 in multiple regression. The example shows that when the estimate of R^2 is badly biased, bootstrap confidence limits are badly displaced to the right. Included in the notes is the standard error of R^2 and information about adjusted R^2.
- StatLib statistical computing repository
- The penalized package in R
Studies of Methods Used in the Text
- Recent simulation experiments conducted by Carl Moons and Frank Harrell indicate that the performance of
transcan for multiple imputation is about halfway between single conditional mean imputation and MICE (see below), consistent with the findings from Faris PD, Ghali WA, et al (2002): Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. J Clin Epidemiology 55:184-191. Suboptimal performance of transcan for multiple imputation is probably due to the fact that transcan fits the flexible additive imputation models and then draws all multiple imputations from the fitted models. A new function in the Main.Hmisc library, aregImpute, uses the bootstrap to re-fit additive nonparametric imputation models for each of the multiple imputations. Results for aregImpute are very promising (see below).
- Validation of binary logistic models
- Simulation studies
- Steyerberg EW, Harrell FE, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF (2001): Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology 54:774-781.
- Steyerberg EW et al (2003): Internal and external validation of predictive models: A simulation study of bias and precision in small samples. Journal of Clinical Epidemiology 56:441-447.
- Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF (2005): Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. Journal of Clinical Epidemiology 58:475-483.
- Studying the degrees of freedom spending strategy that uses generalized Spearman rho^2, in terms of preserving type I error and sigma^2 in ordinary least squares
- Prediction Error in Cox Models Varying Number of Predictors
- Shrinkage and problems with stepwise variable selection: See Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JDF (2001): Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Medical Decision Making 21:45-56.
- Model simplification and stepwise variable selection: See Ambler G, Brady AR, Royston P (2002): Simplifying a prognostic model: a simulation study based on clinical data. Statistics in Medicine 21:3803-3822. The authors studied the performance of the model simplification strategy discussed in the book, and compared it with more traditional variable selection methods, finding that standard variable selection can work well when there is a large proportion of irrelevant variables.
- New case study on penalized maximum likelihood estimation for binary logistic modeling: Moons KGM, Donders ART, Steyerberg EW, Harrell FE (2004): Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clinical Epidemiology 57:1262-1270.
- Example of a spline interaction surface
S-news and R-help
Multiple Imputation
- Joseph Schafer's Multiple Imputation FAQ
- Napier University's imputation page
- Multiple Imputation Online and S-Plus MICE Software by Stef van Buuren and Karin Oudshoorn
- To subscribe to the Impute E-mail discussion group created by Robert Harris of the University of Texas at Dallas, click here.
- A recent paper containing a good overview of multiple imputation and a comparison of some software packages is Horton NJ, Lipsitz SR, The American Statistician 55:244-254; 2001.
- An excellent recent survey of missing data methods is Schafer, JL and Graham JW, Psychological Methods 7:147-177; 2002.
- See also Biases in SPSS 12.0 Missing Value Analysis by Paul von Hippel, The American Statistician 58:160-164; 2004.
- Notes from Tim Hesterberg of Insightful Corporation on why the response variable must be used when doing multiple imputation. Tim's notes include
S-Plus code to do several simulations illustrating his points.
- A nice study and review of multiple imputation
- Comparisons of
aregImpute with other imputation algorithms
- Moons KGM, Donders RART, Stijnen T, Harrell FE, J Clinical Epidemiology 59:1092-1101; 2006.
- Horton NJ, Kleinman KP, The American Statistician 61:79-90; 2007.
General Statistical Information
- General information on regression modeling, including prerequisite material for Regression Modeling Strategies
- Problems with categorizing continuous variables
- Information for biomedical researchers and medical citations for statistical issues
- Notes on regression modeling in randomized clinical trials
- Julian Faraway's free book Practical Regression and Anova using R
- Bender and Benner's paper Calculating ordinal regression models in SAS and S-Plus
- Stephan Rudolfer's presentation Diagnosis of Carpal Tunnel Syndrome using Logistic Regression, an excellent presentation on various types of ordinal logistic models. Includes a nice discussion of accuracy indexes.
- John Fox's Applications of Quantitative Methods in Sociology course material, including information on polytomous logistic regression
- John Fox's excellent article on Bootstrapping Regression Models
- Lindsay Smith's nice tutorial on principal components analysis
- Glossary of Statistical Terms
- Annotated bibliography with emphasis on predictive methods, survival analysis, logistic regression, prognosis, diagnosis, modeling strategies, model validation, practical Bayesian methods, clinical trials, graphical methods, papers for teaching statistical methods, bootstrap, etc; FE Harrell (BibTeX format available here)
- Miscellaneous information on methodology, much of it culled from electronic discussion groups (
.zip file)
- Bob Obenchain's Regression Shrinkage Web page
- Patrick Burns' bootstrap and resampling page
--
FrankHarrell - 30 Jan 2004; updated 29 Feb, 4,27 May, 4, 11 Jul, 29 Aug, 2 Sep 2004, 20, 22 Jan, 24 Mar, 14 Aug 2005, 13 Jan, 11 Nov 2006, 28 Feb, 29 Jun, 12 Jul, 16 Jul, 7 Sep, 22 Oct 2007, 11 Jan, 2 Sep, 18, 23 Dec 2008, 6 Jan, 3 Feb, 13, 17 Apr 2009
Topic revision: r111 - 29 Jun 2009 - 15:40:34 -
FrankHarrell