You are here: Vanderbilt Biostatistics Wiki>Main Web>Education>CourseBios>CourseBios330 (18 Feb 2018, FrankHarrell)EditAttach

f.harrell@vanderbilt.edu

Professor of Biostatistics

Department of Biostatistics

Vanderbilt University School of Medicine

Teaching Assistant: Ryan Jarrett

Assistant: Tawanna Peters (615)322-2001

Office Hours: 10:45-11:45 Thursdays and by appointment, room 11122, 2525 West End

9 January - 19 April 2018, Final Work Due 4 May 2018

Tuesday, Thursday 9:15-10:45

8102, 8th Floor, 2525 West End This course covers many aspects of multivariable regression modeling as it is commonly used in prognostic, diagnostic, and epidemiologic modeling, clinical trials, and prediction in general.

Supplemental Material on Biostatistical Modeling including interactive R demonstrations | Document updates | Biostatistics for Biomedical Research Notes | Blog

- Prognostic estimates can be used to inform the patient about likely outcomes of her disease.
- A physician can use estimates of diagnosis or prognosis as a guide for ordering additional tests and selecting appropriate therapies.
- Outcome assessments are useful in the evaluation of technologies; for example, diagnostic estimates derived both with and without using the results of a given test can be compared to measure the incremental diagnostic information provided by that test over what is provided by prior information.
- A researcher may want to estimate the effect of a single factor (e.g., treatment given) on outcomes in an observational study in which many uncontrolled confounding factors are also measured. Here the simultaneous effects of the uncontrolled variables must be controlled (held constant mathematically if using a regression model) so that the effect of the factor of interest can be more purely estimated. An analysis of how variables (especially continuous ones) affect the patient outcomes of interest is necessary to ascertain how to control their effects.
- Predictive modeling is useful in designing randomized clinical trials. Both the decision concerning which patients to randomize and the design of the randomization process (e.g., stratified randomization using prognostic factors) are aided by the availability of accurate prognostic estimates before randomization. Lastly, accurate prognostic models can be used to test for differential therapeutic benefit or to estimate the clinical benefit for an individual patient in a clinical trial, taking into account the fact that low-risk patients must have less absolute benefit (e.g., lower change in survival probability). To accomplish these objectives, researchers must create multivariable models that accurately reflect the patterns existing in the underlying data and that are valid when applied to comparable data in other settings or institutions. Models may be inaccurate due to violation of assumptions, omission of important predictors, high frequency of missing data and/or improper imputation methods, and especially with small datasets, overfitting.

- accurately
- in a way the sample size will allow, without overfitting
- uncovering complex non-linear or non-additive relationships
- testing for and quantifying the association between one or more predictors and the response, with possible adjustment for other factors

- Papers may be obtained below, along with a schedule of reading assignments
- Simulation study of logistic model validation methods
- Model uncertainty, penalization, and parsimony with examples using AIC to select penalties

- Steyerberg EW.
*Clinical Prediction Models*. New York: Springer; 2009.

- From http://biostat.mc.vanderbilt.edu/DataSets
- Students are encouraged to find their own datasets for the final project

- Will appear on this wiki

`regression-strategies`

.
- Google group regmod

`regmod`

is a good way to post class-specific questions and answers because you can return to the discussion group weeks later and still benefit from seeing answers regarding a specific topic. The discussion group is an excellent way to keep in touch with the class and even more to ask and answer questions. I hope that all students will use it to
* ask or answer any question whatsoever related to group assignments
* ask or answer any logistical or purely technical questions related to individual work assignments
* ask or answer any questions about modeling or statistical computing concepts that are not directly related to a pending individual work assignment
Be `rms`

and `bios330`

instead of the Google group. `rms`

is for statistical questions/answers/discussion and `bios330`

is for course logistics.
`rms`

and `Hmisc`

packages plus several other R packages to be listed here as the class progresses. Students are expected to turn in their assignments in `html`

format created using Markdown with knitr . See KnitrHowto for some useful setup as well as here and here. R and `knitr`

are most easily run by RStudio. This template is highly recommended.
`knitr`

must be used (see above). Assignments must list those who actively participated. `html`

files which include code should be emailed to the teaching assistant or sent via `slack`

personal message.
For the final project you will do an in-depth analysis of a dataset you are interested in which contains many predictors of various types (at least one being continuous unless you receive special instructor permission) and having a binary, continuous, ordinal, or possibly a right-censored response variable. The dataset may not be one used in the course or any of the texts. The dataset should have a sufficient number of observations and the meaning of the data should be such that development of a predictive model makes sense. The analyses you perform on the dataset should use several of the methods we learned in the course. Extra weight is given to selection of appropriate methods, when grading the project. The analysis must include at least one simulation studying the properties of one of the procedures used in developing the model.
- Assignments 2-3 and 8 are group assignments. Constitution of groups is shown at the top of the assignment. Group members are randomized separately for each group assignment.

`knitr`

source files here.
- Individual projects (n=5): 3
- Group projects (n=4): 1
- Final project (n=1): 8
- Quizzes (n=6): 1/3
- Class participation : 2.5

`slack`

. See also this excellent resource on splines. - By 2018-01-17: relaxLinear: smi79spl, gia14opt, col16qua
- By 2018-01-21: multivar: giu11spe, gra91eff
- By 2018-02-03: datasetsCaseStudies: nic99reg spa89dif
- By 2014-02-04: multivar: gre00whe, smi92pro; accuracy (all 4 papers), validation (all papers), logistCal
- modelUncertainty: bor15vie
- Added for MLE mle/jen86jud

Document | Last Revision | What |
---|---|---|

Syllabus | 2018-01-11 | |

Handouts | ||

R Scripts in Book | ||

Assignments | 2018-02-19 | Assignment 5 |

Solutions | 2018-02-19 | Assignment 4 |

Solution knitr source | ||

Readings | ||

Final due date |

- Cosma Shalizi's Undergraduate Advanced Data Analysis course

- TRIPOD: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration

I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|

Rnw | hw4.Rnw | manage | 7.2 K | 08 Mar 2015 - 09:41 | FrankHarrell | knitr source file for Assignment 4 |

r | sat.r | manage | 0.5 K | 12 Jan 2013 - 19:19 | FrankHarrell | R code to create SAT dataset in RMS Chapter 2 |

Edit | Attach | Print version | History: r176 < r175 < r174 < r173 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r176 - 18 Feb 2018, FrankHarrell

Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback