Homework 1 (due on Monday, January 23)

Please note:
  • Problems will be added until the last class before the due date.
  • No electronic submission unless explicitly allowed.
  • Use your own words when answering questions. Copying from other sources (including textbooks, handouts, my blog) is strongly discouraged, because it often indicates you don't understand your answer.

  1. List 8 or more major problems associated with the spreadsheet from hell.
  2. Describe the difference between a continuous variable and a categorical variable, the advantages and/or disadvantages of categorizing a continuous variable.
  3. Describe what sensitivity analysis is and what scenarios a sensitivity analysis can lead to.
  4. Describe Simpson's paradox. A helpful reading is here.
  5. Nashville's December 2005 daily mean temperatures had mean 37.7 degrees (Fahrenheit) and standard deviation 7.0 degrees. The formula between Fahrenheit and Celcius is F = C * 9/5 + 32. Now, do you have enough information to get the mean and SD in Celcius? If yes, what are these? If no, what else do you need?
  6. A binary outcome can only take two possible values. Examples include coin flipping, sex of newborn babies, having a type of cancer or not, etc. We always can denote one outcome as "1" and the other as "0". Let the probability of having "1" be p. Then 0 < p < 1 and q = 1 - p is the probability of having "0". Suppose there are n outcomes. The number x of outcome "1" can vary from 0 to n, with varying probabilities. These possible outcomes together with their associated probabilities are called a binomial distribution. The parameter p can be estimated by x/n. The coefficient of variation of this estimator is √[(1 - p)/(np)] x 100%.
    • Calculate the CV for n = 10, 100, 1000 and p = .01, .05, .1, .3, .5.
    • Comment on how CV changes as n increases with p fixed and as p changes with n fixed.
    • Suppose you want to estimate a cancer rate with accuracy measured as CV < 10%. What sample size do you need if the real rate is about 10%? What sample size do you need if the real rate is about 1%?
