You are here: Vanderbilt Biostatistics Wiki>Main Web>Projects>ClinModVal (revision 6)EditAttach

Model and Discovery Validation for a Clinical Audience

Development of a Manuscript with Possible Co-Authors Ewout Steyerberg, Karel Moons, Frank Harrell, Dean Billheimer, David Ransohoff

Goals of Paper

To explain in non-technical terms and to demonstrate with a simulation the following concepts:
  • internal, external, boostrapping, cross-validation, data-splitting
  • why data-splitting results in low-precision accuracy estimates and wastes training data
  • bias and optimism, overfitting
  • need for estimates of precision of accuracy, and why a statistical index of model performance computed in a validation is only an estimate
  • how to do validation incorrectly
    • contamination of test sample with training data
    • failure to freeze model, e.g. not repeating variable selection for each resample
    • inadequate test sample size (or for resampling methods, inadequate total sample size)
    • validating more than a few models or markers and choosing the one that validates the best
Target Journal: JAMA, Annals, NEJM

Outline


Manuscript


Background Papers to Review

  • Rich Simon/Lisa McShane JNCI 2003, where they describe those reasons and how the Netherlands BrCa prognosis group did it WRONG in Nature 2002 (see p16, 2nd column, top paragraph).
  • Keith Baggerly, Jeff Morris and Kevin Coombs paper in Bioinformatics 2004
  • Papers by Steyerberg and Moons
  • General paper on validation by Amy Justice

Discussions Not Yet Incorporated into Manuscript

Baggerly, Morris and Coombs (BMC, Bioinformatics 2004)

BMC assess the data and analysis approach used by Petricoin et al.(2002). Petricoin and colleagues develop an ovarian cancer classification algorithm based on 100 SELDI serum spectra (training set, 50 cancers 50 normals). Their resulting algoithm correctly classified 50 of 50 cancer cases, and 47 of 50 normals in a test set. Further, the algorithm correctly identified 16 of 16 cases of benign disease as "other" than normal or cancer. BMC's subsequent evaluation of the Petricoin data (as well as two supporting data sets) indicates that the structural features found to distinguish cancer from normal appear to be attibutable to artifacts caused by the measurement technology, or differential sample handling/processing. They argue convincingly that the classification results are driven by systematic differences other than biology, and are not useful for cancer detection in a novel sample. BMC propose that better use of (statistical) experimental design, and better external validation would "help". (They don't say precisely how it would help.)

My interpretation of this is as follows. I agree with BMC that Petricoin et al. inadvertantly introduced systematic bias into their SELDI spectra. Further, this bias was introduced such that no form of internal validation would have detected it. Both the training and testing sets were contaminated with this bias. (Another way of saying this is that the bias factor is completely confounded with the cancer/normal classification.) It is possible that better experimental design (e.g., randomization - statistical design; protocol standardization - logical design ?) would have reduced the effect of confounding.

DeanBillheimer 01 Apr 04

Model Validation vs. Model Assessment

This is a picky, idiosyncracy of mine. I want to mention it, and if I'm voted down, I'll live with it. I find the term model validation to be seriously misleading. I think this is because I don't know what it means for a model to be valid. Non-technically, we tend to equate "valid" with "true", but there is no such thing as a "true model" (except as a thought experiment or simulation). Further, we know that no amount of observed data can confirm a model's truth ( = validity). Is Newton's model for gravity "valid"? At the level of predicting planetary motion, it seems so; at other levels (e.g. where quantum or relativistic effects interfere) it's not so good (I think?). When trying to predict a falling body's velocity profile, the gravity model is incomplete; it's not really valid or invalid. Instead of describing a model as valid or not, I think it would be better to describe the situations under which a model predicts well.

Operationally, I'm not sure what it means for a model to be valid. For example, what level of predictive performance is requred for validation? Is 90% correct classification good enough to claim a valid model? Clearly, if we change the population to which the modeling results are applied we can change the predictive performance. Also, in this setting we already have sensitivity, specificity, predictive value positive, and predictive value negative to describe different aspects of a model's predictive behavior. These seem to me to be reasonable measures assessing its performance. I don't understand how they relate to validity.

Close by, is predictive performance the only characteristic of a model relevant for validation? We (statisticians, at least) don't use the term "validity" when evaluating the effect of a regression variable.

Clearly, I'm a neophyte at model validation.

So why the rant? I am wary of anyone who claims that their model has been "validated", without providing specific performance details. More importantly, I think that non-technical (non-statistical?, non-critical?) consumers of models can be lulled into false security by using "validated" models (an aside: Why aren't they called "valid" models?). A claim of validity confuses the mathematical model with the phenomenon being described by mathematics. Finally, the term "validation" tends to subvert critical evaluation. To me, "assessment" implies critical evaluation.

As a working title, let me propose a start: The Assessment of Predictive Performance of Clinical Models.

DeanBillheimer 01 Apr 04


Links to Other Sites

Ewout: The slides are nice. You may want to incorporate some of the points there into the outline at the top -FH
Topic attachments
I Attachment Action Size Date Who Comment
Validationofpredictivemodels.pdfpdf Validationofpredictivemodels.pdf manage 88.0 K 30 Mar 2004 - 05:44 EwoutSteyerberg ES: educational material on validation
Validationscheme.pptppt Validationscheme.ppt manage 26.0 K 01 Apr 2004 - 13:42 EwoutSteyerberg A scheme for internal and external validation/dity
Edit | Attach | Print version | History: r15 | r7 < r6 < r5 < r4 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r6 - 01 Apr 2004, DeanBillheimer
 

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback