Homework 3 (due on Monday, February 20)

Please note:
  • Problems will be added until the last class before the due date.
  • No electronic submission unless explicitly allowed.
  • Use your own words when answering questions. Copying from other sources (including textbooks, handouts, my blog) is strongly discouraged, because it often indicates you don't understand your answer.

  1. The hospital data (see here for variable explanations) are a sample from a larger data set collected on people discharged from a selected Pennsylvania hospital as part of a retrospective chart review of antibiotic usage in hospitals.
    • Find the best-fitting linear relationship between ln(duration of hospitalization) and age.
    • Test for the significance of this relationship. State any underlying assumptions you have used.
    • What is R2 for this regression?
    • Assess the goodness of fit of the regression line.
  2. The birthweight-estriol data are from a study to relate birthweight to the estriol level of pregnant women. If you draw a scatter plot of birthweight versus estriol level, you will see a linear relationship between them, although this relationship is not consistent and considerable scatter exists throughout the plot.
    • How can this relationship be quantified?
    • What is the estimated average birthweight if a pregnant woman has an estriol level of 15 mg/24 hr?
    • Low birthweight is defined here as ≤2500 g. For what estriol level would the predicted birthweight be 2500 g?
    • Interpret the slope of the regression line.
  3. The vital lung data (Stata format, text format) looks at mine workers’ vital lung capacity (a continuous measure of lung health), exposure to cadmium, and age.
    • Let's ignore the age variable for this question. Carry out a one-way ANOVA analysis (or a t-test) to see if vital capacity differs between mine workers with >10 years of cadmium exposure and those without exposure. Is the effect of cadmium exposure on vital lung capacity significant? What assumptions do we make in this analysis? How can we check the validity of the assumptions?
    • Do you think age should be taken into consideration? Provide the rationale for your answer.
    • Suppose we think age should be taken into account, and we carry out a linear regression of vital lung capacity on age and cadmium. Is the effect of cadmium exposure on vital lung capacity significant? Compare the result with that of the first question. If the results are different, what are the reasons for the difference?
    • Do you want to carry out further analyses? Provide the rationale for your answer. If your answer is yes, carry out the analyses you propose to do.

Topic revision: r5 - 30 Oct 2006, ChunLi

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback