You are here: Foswiki>Main Web>StatComp>RmS (02 Nov 2009, FrankHarrell)EditAttach

REGRESSION MODELING STRATEGIES

With Applications to Linear Models, Logistic Regression, and Survival Analysis

Information and Updates to the Book | Regression Modeling Strategies Package: rms for R

  • REGRESSION MODELING STRATEGIES with Applications to Linear Models, Logistic Regression, and Survival Analysis by FE Harrell. The book was published June 5 2001 by Springer New York, ISBN 0-387-95232-2 (also available at amazon.com, Booksamillion.com, DirectTextBook, StudentMarket.com, or FetchBook.Info). Click here to see the text from the book's back cover. Click here to see the preface and table of contents for the book manuscript in .pdf format. Click here to obtain a partial index to the book in .pdf format, and here to see a sample chapter from the book (Note:This material is Copyright 2001-2004 Springer-Verlag and may not be reproduced).
  • Changes and additions planned for the second edition (projected publication date third quarter 2010)
  • Reviews of the book:
    • Statistical Methods in Biomedical Research
    • Biometrics 58:477, June 2002
    • Bulletin of the Swiss Statistical Society 43:11 (to appear also in Statistical Methods in Medical Research)
    • International Journal of Epidemiology 31(3):699-700, June 2002. Note: This otherwise excellent review states that the book recommends selecting variables to include in the model on the basis of their frequency of selection by a bootstrap procedure. This is definitely not the case.
    • Journal of the American Statistical Association 98:257-258, March 2003
    • Medical Decision Making, 23(2):182-183, April 2003
    • Technometrics 45:170, May 2003
    • Statistics in Medicine 22:2531-2532, 15 Aug 2003
    • Reviews at Springer
  • Errata for the first and second and later printings. The book had its third printing in December 2002 and its fourth printing in December 2003. The sixth printing was in December 2005.
  • Syllabus for a one-semester course using part of the text, for students who have not had a course in linear regression.
  • Interactive Overview of many of the methods in the book
  • Syllabus for a more advanced course using the text up until survival analysis, for students already versed in ordinary linear regression.

Short Courses

2009 Upcoming short courses in Regression Modeling Strategies

  • NEW 1 Day Course, Centers for Disease Control, Atlanta, 6 Oct 2009
  • 1/2 Day Short Course, useR! 2009 International R Users Conference, Rennes, France, July 2009
  • Webinar on Parametric Survival Modeling for the ASA Biopharmaceutical Section, 3 April 2009
  • One-day Short Course, ENAR (Eastern North American Region of the Biometrics Society), San Antonio, Texas, 15Mar09
  • Five-Session Short Course by Frank E. Harrell, Jr., Ph.D., Professor and Chair, Department of Biostatistics, Vanderbilt University School of Medicine.
    • Date: February 2 - 6, 2009
    • Requirements: strong competence in multiple regression models.
    • Target audience: statisticians and related quantitative researchers who want to learn some general model development strategies, including approaches to missing data imputation, data reduction, model validation, and relaxing linearity assumptions.
    • Registration: please email Eve Anderson.
      • Course fee:
        VU and MMC Students and Post-docs $50
        VU and MMC Faculty and Staff $200
        Other Students $200
        Other Members of Non-Profit Institutions $300
        Members of For-Profit Institutions $600
        No charge to Department of Biostatistics faculty/staff
      • Payment method: cash, check, VUMC form 1180, Visa and MasterCard.
      • Book: attendees may purchase a copy of Dr. Harrell's book by following the links above to purchase sites.
      • Registration deadline: EXTENDED January 23, 2009.
      • Cancellation policy: TBA
      • Handouts: see below; printed copies will be provided for paying attendees.

DAY DATE TIME LOCATION
Monday February 2, 2009 8:30AM/11:30AM Student Life Center, LL Mtg Rooms 1&2
Tuesday February 3, 2009 8:30AM/11:30AM Student Life Center, LL Mtg Rooms 1&2
Wednesday February 4, 2009 8:30AM/11:00AM Student Life Center, LL Mtg Rooms 1&2
Thursday February 5, 2009 8:30AM/11:30AM MRBIII, Room 3131
Friday February 6, 2009 8:30AM/11:30AM Student Life Center, LL Mtg Rooms 1&2

Extended Discussions: Tuesday Feb. 3 and Friday Feb. 6 1:15pm-3:00pm, location: D-2221 Medical Center North. This is a good time to bring up questions that are very specific to your area of research.


Housing for guests attending Short Courses


Materials

  • Handouts - be sure to print two pages per side of paper
  • Syllabus for a 1-day short course based on the text
  • Syllabus for a 3-day short course and S workshop based on the text
  • Syllabus for a 1-day short course "Modern Approaches to Predictive Modeling and Covariable Adjustment in Randomized Clinical Trials"
  • Scripts developed in class during the May 2000 or August 2000 3-day courses or the June 2001 or June 2002 3-day course for Insightful Corporation
  • Course Evaluations

Discussion Board

  • A discussion board for readers and the author to discuss questions, issues, controversies, and new research related to the text

Datasets

Additional Problems for Students

  • Homework assignments not in the book. Some of the early assignments are for basic regression, a prerequisite for the book. Some of the problems use data in Rosner B: Fundamentals of Biostatistics, 5th Edition. Belmont CA: Duxbury Press; 1999. Solutions to these problems as well as solutions to many of the problems given in the book are available to instructors by E-mailing the author

Quizzes

  • Quizzes (with answer sheets) on concepts in the text and on prerequisites, are available to instructors by E-mailing the author

Software

  • Statistical Computing Tools page for downloading Hmisc and Design
  • R and the Hmisc and Design libraries: There is a freely available software system that has most of the features of S-PLUS called R. The Hmisc library was made available for R 7Apr02 and the Design library for R was made available on 20May02. Both libraries work on Linux, Unix, MacOSX, and Windows versions of R. In July 2003 both libraries were placed on CRAN, the Comprehensive R Archive Network, so they may be downloaded and installed or updated using simple R commands (or menus in Windows).
  • Overview of the Hmisc Library
  • Overview of the Design Library
  • Reference card for the Design Library. If using US letter size paper, print the first page then insert it back into the printer to print the second page on back. Fold and cut into a booklet.
  • Online documentation for the Hmisc and Design Libraries.
  • Interactive S scripts demonstrating various curve fitting criteria and showing the flexibility of restricted cubic splines
  • An Introduction to S and the Hmisc and Design Libraries by CF Alzola and FE Harrell
  • Statistical computing course material
  • Miscellaneous S software available on the Internet that is related to some of the methods covered in this book such as data reduction, censored data analysis, imputation, recursive partitioning
  • Unsupported SAS macros for restricted cubic splines, displaying survival estimates, and checking proportional hazards and other model assumptions, etc. (FE Harrell, 1991). Click here for information, examples, and brief information on SAS procedures useful for multivariable modeling and on obtaining predicted values with SAS.
  • SAS macros for various censored data calculations such as AUC, from Chambless LE, Diao G. Estimation of time-dependent area under the ROC curve for long-term risk prediction. Stat Med 2006 Oct 30; 25 (20):3474-86.
  • Warren Sarle's SAS macros and examples for bootstrapping and jackknifing. See Warren's cautionary note on bootstrap confidence intervals, with a good example related to R^2 in multiple regression. The example shows that when the estimate of R^2 is badly biased, bootstrap confidence limits are badly displaced to the right. Included in the notes is the standard error of R^2 and information about adjusted R^2.
  • StatLib statistical computing repository
  • The penalized package in R

Studies of Methods Used in the Text

  • Recent simulation experiments conducted by Carl Moons and Frank Harrell indicate that the performance of transcan for multiple imputation is about halfway between single conditional mean imputation and MICE (see below), consistent with the findings from Faris PD, Ghali WA, et al (2002): Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. J Clin Epidemiology 55:184-191. Suboptimal performance of transcan for multiple imputation is probably due to the fact that transcan fits the flexible additive imputation models and then draws all multiple imputations from the fitted models. A new function in the Hmisc library, aregImpute, uses the bootstrap to re-fit additive nonparametric imputation models for each of the multiple imputations. Results for aregImpute are very promising (see below).
  • Validation of binary logistic models
    • Simulation studies
    • Steyerberg EW, Harrell FE, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF (2001): Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology 54:774-781.
    • Steyerberg EW et al (2003): Internal and external validation of predictive models: A simulation study of bias and precision in small samples. Journal of Clinical Epidemiology 56:441-447.
    • Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF (2005): Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. Journal of Clinical Epidemiology 58:475-483.
  • Studying the degrees of freedom spending strategy that uses generalized Spearman rho^2, in terms of preserving type I error and sigma^2 in ordinary least squares
  • Prediction Error in Cox Models Varying Number of Predictors
  • Shrinkage and problems with stepwise variable selection: See Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JDF (2001): Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Medical Decision Making 21:45-56.
  • Model simplification and stepwise variable selection: See Ambler G, Brady AR, Royston P (2002): Simplifying a prognostic model: a simulation study based on clinical data. Statistics in Medicine 21:3803-3822. The authors studied the performance of the model simplification strategy discussed in the book, and compared it with more traditional variable selection methods, finding that standard variable selection can work well when there is a large proportion of irrelevant variables.
  • New case study on penalized maximum likelihood estimation for binary logistic modeling: Moons KGM, Donders ART, Steyerberg EW, Harrell FE (2004): Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clinical Epidemiology 57:1262-1270.
  • Example of a spline interaction surface

S-news and R-help

Multiple Imputation

  • Joseph Schafer's Multiple Imputation FAQ
  • Napier University's imputation page
  • Multiple Imputation Online and S-Plus MICE Software by Stef van Buuren and Karin Oudshoorn
  • To subscribe to the Impute E-mail discussion group led by Juned Siddique of Northwestern University, click here.
  • A recent paper containing a good overview of multiple imputation and a comparison of some software packages is Horton NJ, Lipsitz SR, The American Statistician 55:244-254; 2001.
  • An excellent recent survey of missing data methods is Schafer, JL and Graham JW, Psychological Methods 7:147-177; 2002.
  • See also Biases in SPSS 12.0 Missing Value Analysis by Paul von Hippel, The American Statistician 58:160-164; 2004.
  • Notes from Tim Hesterberg of Insightful Corporation on why the response variable must be used when doing multiple imputation. Tim's notes include S-Plus code to do several simulations illustrating his points.
  • A nice study and review of multiple imputation
  • Comparisons of aregImpute with other imputation algorithms
    • Moons KGM, Donders RART, Stijnen T, Harrell FE, J Clinical Epidemiology 59:1092-1101; 2006.
    • Horton NJ, Kleinman KP, The American Statistician 61:79-90; 2007.

General Statistical Information


-- FrankHarrell - 30 Jan 2004; updated 29 Feb, 4,27 May, 4, 11 Jul, 29 Aug, 2 Sep 2004, 20, 22 Jan, 24 Mar, 14 Aug 2005, 13 Jan, 11 Nov 2006, 28 Feb, 29 Jun, 12 Jul, 16 Jul, 7 Sep, 22 Oct 2007, 11 Jan, 2 Sep, 18, 23 Dec 2008, 6 Jan, 3 Feb, 13, 17 Apr, 9 Sep 2009

Topic attachments
I Attachment Action Size Date Who Comment
pdfpdf 5SessionShortCourseFlyer08.pdf manage 72.6 K 21 Dec 2007 - 12:34 DianeKolb 5 Session Short Course 2008
htmlhtml ada.html manage 11.7 K 04 Jul 2004 - 07:12 FrankHarrell Syllabus for Advanced Data Analysis Course
pdfpdf biomodHw.pdf manage 80.4 K 11 Jul 2004 - 09:16 FrankHarrell Homework assignments
pdfpdf logistic.val.pdf manage 17.2 K 24 Mar 2004 - 12:52 FrankHarrell Simulation study of logistic model validation
pdfpdf rms.pdf manage 1845.6 K 04 Oct 2009 - 15:46 FrankHarrell Regression Modeling Strategies Handouts
Topic revision: r115 - 02 Nov 2009 - 16:04:13 - FrankHarrell
Main.RmS moved from Archive.RmS on 18 Jul 2008 - 17:16 by ColeBeck - put it back
 
Register | Log In
Copyright © 2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback