You are here: Vanderbilt Biostatistics Wiki>Main Web>Education>CourseBios>CourseBios330 (17 Feb 2020, FrankHarrell)EditAttach

f.harrell@vanderbilt.edu

Professor of Biostatistics

Department of Biostatistics

Vanderbilt University School of Medicine

Teaching Assistants: Lisa Lin and Hannah Weeks (contact them through Slack)

Assistant: Tawanna Peters (615)322-2001

Office Hours: After class, or by appointment, room 11122, 2525 West End

7 January - 16 April 2020, Final Work Due 2020-05-01

Grades are due by 11:59pm on Saturday, 2020-05-04. Last official exam day 2020-05-02

Tuesday, Thursday 3:30-5:00

8102, 8th Floor, 2525 West End This course covers many aspects of multivariable regression modeling as it is commonly used in prognostic, diagnostic, and epidemiologic modeling, clinical trials, and prediction in general.

Supplemental Material on Biostatistical Modeling including interactive R demonstrations | Document updates | Biostatistics for Biomedical Research Notes | Blog

- Prognostic estimates can be used to inform the patient about likely outcomes of her disease.
- A physician can use estimates of diagnosis or prognosis as a guide for ordering additional tests and selecting appropriate therapies.
- Outcome assessments are useful in the evaluation of technologies; for example, diagnostic estimates derived both with and without using the results of a given test can be compared to measure the incremental diagnostic information provided by that test over what is provided by prior information.
- A researcher may want to estimate the effect of a single factor (e.g., treatment given) on outcomes in an observational study in which many uncontrolled confounding factors are also measured. Here the simultaneous effects of the uncontrolled variables must be controlled (held constant mathematically if using a regression model) so that the effect of the factor of interest can be more purely estimated. An analysis of how variables (especially continuous ones) affect the patient outcomes of interest is necessary to ascertain how to control their effects.
- Predictive modeling is useful in designing randomized clinical trials. Both the decision concerning which patients to randomize and the design of the randomization process (e.g., stratified randomization using prognostic factors) are aided by the availability of accurate prognostic estimates before randomization. Lastly, accurate prognostic models can be used to test for differential therapeutic benefit or to estimate the clinical benefit for an individual patient in a clinical trial, taking into account the fact that low-risk patients must have less absolute benefit (e.g., lower change in survival probability). To accomplish these objectives, researchers must create multivariable models that accurately reflect the patterns existing in the underlying data and that are valid when applied to comparable data in other settings or institutions. Models may be inaccurate due to violation of assumptions, omission of important predictors, high frequency of missing data and/or improper imputation methods, and especially with small datasets, overfitting.

- accurately
- in a way the sample size will allow, without overfitting
- uncovering complex non-linear or non-additive relationships
- testing for and quantifying the association between one or more predictors and the response, with possible adjustment for other factors

- Papers may be obtained below, along with a schedule of reading assignments
- Simulation study of logistic model validation methods
- Model uncertainty, penalization, and parsimony with examples using AIC to select penalties

- Steyerberg EW.
*Clinical Prediction Models*. New York: Springer; 2009.

- From http://biostat.mc.vanderbilt.edu/DataSets
- Students are encouraged to find their own datasets for the final project

- Will be on slack rms330 channel

`regression-strategies`

. - Long discussions: http://datamethods.org/c/stat/rms

- vbiostatcourse.slack.com
- Channel
`bios330`

for logistics, private and group messaging, questions about group assignments, stat computing issues - Channel
`rms`

for questions and answers and short to medium-length discussions

- Channel

`rms`

and `Hmisc`

packages plus several other R packages to be listed here as the class progresses. Students are expected to turn in their assignments in `html`

format created using Markdown with knitr . See KnitrHowto for some useful setup as well as here and here. R and `knitr`

are most easily run by RStudio. This template is highly recommended.
`knitr`

must be used (see above). Assignments must list those who actively participated. `html`

files which include code should be emailed to the teaching assistant or sent via `slack`

personal message.
For the final project you will do an in-depth analysis of a dataset you are interested in which contains many predictors of various types (at least one being continuous unless you receive special instructor permission) and having a binary, continuous, ordinal, or possibly a right-censored response variable. The dataset may not be one used in the course or any of the texts. The dataset should have a sufficient number of observations and the meaning of the data should be such that development of a predictive model makes sense. The analyses you perform on the dataset should use several of the methods we learned in the course. Extra weight is given to selection of appropriate methods, when grading the project. The analysis must include at least one simulation studying the properties of one of the procedures used in developing the model.
- Assignments 2-3 and 8 are group assignments. Constitution of groups is shown at the top of the assignment. Group members are randomized separately for each group assignment.

- Individual projects (n=5): 3
- Group projects (n=4): 1
- Final project (n=1): 8
- Quizzes (n=6): 1/3
- Class participation : 2.5

- By 2020-01-15: relaxLinear: smi79spl, gia14opt, col16qua
- By 2020-01-18: multivar: gra91eff
- By 2020-01-23: missingData: pen15mul, don06rev, hei06imp (skim), hip07reg (skim), jan10mis (skim), muchado
- By 2020-01-25: multivar: giu11spe, gre00whe, smi92pro, ril18min, ril18mina
- By 2019-01-30: datasetsCaseStudies: nic99reg spa89dif
- By 2019-02-02: multivar: accuracy (all 4 papers), validation (all papers)
- modelUncertainty: bor15vie
- Added for MLE mle/jen86jud

Document | Last Revision | What |
---|---|---|

Syllabus | 2020-02-04 | |

Handouts | 2019-11-29 | |

R Scripts in Book | ||

Assignments | 2020-02-17 | Assignment 5 |

Solutions | 2020-02-17 | Assignment 4 |

Solution knitr source | ||

Study Questions | ||

Readings | 2020-02-02 | |

Final due date |

- Cosma Shalizi's Undergraduate Advanced Data Analysis course

- http://fharrell.com/links
- TRIPOD: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration

Edit | Attach | Print version | History: r213 < r212 < r211 < r210 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r213 - 17 Feb 2020, FrankHarrell

Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback