You are here: Vanderbilt Biostatistics Wiki>Main Web>Education>CourseBios>CourseBios6311>Bios311Syllabus2014 (25 Aug 2015, RobertGreevy)EditAttach

- Get a sense of the class; determine if it is a good fit for you. See Bios 311 Class Details and Bios 311 Syllabus 2013 for more information.
- Learn some of your classmates names. Get to know them a little.
- Learn all of the pig rolls, terminology, and scoring values. See roll values.

- Read Rosner Ch 1 and 2. By "read", skim to remind yourself of the parts you already know (should be most of it) and read to learn the parts you don't know. I'll lecture very briefly on these chapters Tuesday.
- Borrow a pig to take home for the weekend.
- Roll your pig 100 times and collect the outcomes, i.e. the number of times it landed on each of the six possibilities.
- dot side up
- dot side down
- razorback
- trotter
- snouter
- leaning jowler

- On Tuesday, you'll enter your data into a spreadsheet and return your pig.
- Install R and RStudio on your laptop.
- On Tuesday,
**bring your laptops and books**for class and lab. You'll be working problems from the text and a group quiz I'll hand out.

- Sampling Distributions!!!
- Exploring R.
- Chapter 2 concepts, especially summarizing and describing data/distributions.

The most important idea to understanding the "why" of statistical methods in the traditional paradigm, aka frequentist paradigm.

pigSides <- c( 'dot', 'nodot', 'razorback', 'trotter', 'snouter', 'leaningjowler' ) pigSides sample( pigSides, replace = F ) sample( pigSides, replace = T ) sample( pigSides, 20, replace = T )

ClassRandomizer <- function(){ # function that splits the class into random groups # this function has no inputs, e.g. you can't specify the number of groups class <- c( 'Alex', 'Alice C.', 'Alice T.', 'Andrew', 'Christopher', 'Derek', 'Jea Young', 'Jie', 'Jonathan', 'Lauren', 'Linda', 'Ryan', 'Sam', 'Svetlana', 'Travis', 'Ying' ) # randomly shuffle the class classSample <- sample( class, replace=F ) # print out the groups # do I need to add Meredith? # printing would be prettier with commas between names cat( c( "\nGroup A:", classSample[1:4]), "\n\n" ) cat( c( "Group B:", classSample[5:8]), "\n\n" ) cat( c( "Group C:", classSample[9:12]), "\n\n" ) cat( c( "Group D:", classSample[13:16]), "\n\n" ) }

- Let R = one roll of a pig
- R = 1 if roll is a razorback
- R = 0 if roll is anything other than a razorback

- Let X = sum of the results for 100 rolls
- X = R1 + R2 + R3 + ... + R100

- Let theta = the true probability of rolling a razorback
- Let theta.hat = your estimate for the probability of rolling a razorback
- theta.hat = X/100

SamplingDistribution <- function( nExperiments = 10^4, nPerExperiment = 100, theta = 0.40 ){ # simulate a bunch of experiments of size nPerExperiment and where theta is known (a bunch = nExperiments) # think about how you'd write this function using just the sample function # let this next step just be magic for now, but in short, R has already has a function to do exactly what we want rbinom( nExperiments, nPerExperiment, theta ) } dist1 = SamplingDistribution () print( summary(dist1) )

data <- read.csv( "http://biostat.mc.vanderbilt.edu/wiki/pub/Main/Bios311Syllabus2014/20140826-ClassPigData.csv" )

- Get a vision of where we are going -- we want to carefully describe the uncertainty of our estimates.
- Introduction to probability
- S = Sample space, A = event, Pr( A ) or simply P( A ) = probability.
- Venn Diagrams. Unions and intersections.
- Complements of sets and the Null set.
- Mutually exclusive events. P(A or B) = P(A U B) = P(A) + P(B).
- The addition law of probability. P(A or B) = P(A U B) = P(A) + P(B) - P(AB).
- Independence. Multiplication law of probability. P(AB) = P(A)*P(B). What this looks like in a Venn diagram.
- Conditional probability. P(AB)/P(B) = the conditional probability of A given B. What this looks like in a Venn diagram.
- Probability tree diagrams. Not in the book, but can help thinking through some problems of modest scale.

- Read Rosner Ch 3. Reading guidance:
- 3.1-3.6 Intro, Definitions, Notation, Multiplication law, Addition law, Conditional probability - these are key ideas we need to move our discussion forward.
- 3.7-10 Bayes' rule and screening tests, Bayesian inference, ROC curves, Prevalence and incidence - we don't need these ideas for the immediate discussion, but the ideas are very important (and tend to show up on Biostat comps)
- 3.11 Summary - usually a better intro than the intro.
- Work the problems from the beginning through the Genetics groups of problems 3.30 etc. The Genetics and Mental Health group of problems are especially nice for practicing the basic probability skills.

- Start Reading Rosner Ch 4. Reading guidance:
- 4.1-4.3 and 4.8 Intro, RV's, PMF (PDF for discrete distributions), Binomial Distribution - this is the other set of key ideas we need to move our discussion forward.

- Binomial distribution
- Confidence intervals for proportions
- Operating characteristics of confidence intervals (performance metrics)

- Quiz01 feedback.
- 030_Bernoulli_lecture.pdf

- Group A: Lauren Alex Travis Sam
- Group B: Jonathan Jea Young Svetlana Alice T.
- Group C: Christopher Jie Alice C. Derek
- Group D: Linda Ying Ryan Andrew

- argmin_UB[ P(X <= 22 | theta = UB) <= 0.025 ]
- argmax_LB[ P(X >= 22 | theta = LB) <= 0.025 ]
- Note the pbinom() function and some trial and error may come in handy here.

- Confidence intervals for proportions
- Operating characteristics of confidence intervals (performance metrics)

- Quiz 02 feedback.
- Quiz 02 follow-up exercise.
- Compare the coverage of the exact 95% CI with the Bootstrap Credible Interval (see below) over a range of thetas, i.e. compare the "correct" answer to Q6 to the "incorrect" answer.
- Discuss how the answers compare. Is one method clearly preferable? Which would you choose and why?
- Look at another performance metric, CI widths. Discuss how the answers compare. Is one method clearly preferable? Which would you choose and why?
- Time permitting, try a few different N's, say 40, 500, and 3000. How does this impact your decision of which method to choose?

- We rolled 22 razorbacks out of 100 rolls.
- Suppose the true theta was 0.22. What is a reasonable range for the number of razorbacks we could roll if theta=0.22? Specifically, solve for the 2.5th and 97.5th percentiles of Binom(100, 22). The command qbinom() may be helpful here. This yields a set of bounds on the scale of X, the number of razorbacks rolled.
- Convert that range of razorback rolls into an upper and lower bound for theta, i.e. divide by 100.

- We rolled 22 razorbacks out of 100 rolls.
- Sequentially suppose a whole bunch of thetas going from 0 to 1. Which of those thetas are plausible given that we rolled 22 razorbacks?
- Eliminate any thetas where the probability of rolling a 22 or anything even more extreme for that theta is < 0.025.
- One way of expressing that is the two equations given in the quiz. Another is equation 6.20 in Rosner.

<b>*Q01)*</b> Using the rbinom() and hist() commands, graphically display and approximate pdf for X ~ Binom(100, 0.25). Comment on why this figure is an approximation and what influences the accuracy of the approximation. ```{r} hist( rbinom(10^5,100,0.25) ) ``` <b>*Q02)*</b> Using the dbinom() and plot() commands, graphically display and exact pdf for X ~ Binom(100, 0.25). ```{r} x <- 0:100 plot( x, dbinom(x,100,0.25), type='s' ) ``` <b>*Q03)*</b> Suppose you had rolled 22 razorbacks out of 100 rolls. Find a lower and upper bound (LB, UB) for the true probability of rolling a razorback using the following criteria. argmin_UB[ P(X <= 22 | theta = UB) <= 0.025 ] argmax_LB[ P(X >= 22 | theta = LB) <= 0.025 ] Note the pbinom() function and some trial and error may come in handy here. ```{r} <b>*Q01)*</b> Using the rbinom() and hist() commands, graphically display and approximate pdf for X ~ Binom(100, 0.25). Comment on why this figure is an approximation and what influences the accuracy of the approximation. ```{r} hist( rbinom(10^5,100,0.25) ) ``` <b>*Q02)*</b> Using the dbinom() and plot() commands, graphically display and exact pdf for X ~ Binom(100, 0.25). ```{r} x <- 0:100 plot( x, dbinom(x,100,0.25), type='s' ) ``` <b>*Q03)*</b> Suppose you had rolled 22 razorbacks out of 100 rolls. Find a lower and upper bound (LB, UB) for the true probability of rolling a razorback using the following criteria. argmin_UB[ P(X <= 22 | theta = UB) <= 0.025 ] argmax_LB[ P(X >= 22 | theta = LB) <= 0.025 ] Note the pbinom() function and some trial and error may come in handy here. ```{r} # solve for lower bound # tricky part: you need make sure 22 is included in your sum of the upper tail, i.e. sum from 0 to 21 and subtract that prob from 1 # First, get a broad sense of where the threshold is. 1-pbinom(22-1,100,0.14,lower.tail=T) 1-pbinom(22-1,100,0.15,lower.tail=T) # Then, iteratively zoom in ... 1-pbinom(22-1,100,0.140,lower.tail=T) 1-pbinom(22-1,100,0.145,lower.tail=T) # until you reach your desired level of precision. 1-pbinom(22-1,100,0.1433,lower.tail=T) 1-pbinom(22-1,100,0.1434,lower.tail=T) # Likewise, solve for upper bound. pbinom(22,100,0.3140,lower.tail=T) pbinom(22,100,0.3139,lower.tail=T) ``` Answer: (0.143, 0.314) <b>*Q04)*</b> Check your answer for Q03 using the binconf() command in the Hmisc package for R. ```{r} library(Hmisc) binconf(x=22,n=100,method="exact") ``` Answer: (0.1433, 0.3139) <b>*Q05)*</b> Let X ~ Binom(100, 0.25). Let C = 1 if the exact 95% CI (akin to Q03 and Q04) contains the true value of theta, 0.25. Let C = 0 if it doesn't. Calculate E[ C ], i.e. calculate the true coverage probability for the 95% exact CI given X is Binom(100, 0.25). ```{r} # Create a vector of the sample space for X, # the number of razorbacks rolled. x <- 0:100 # Calculate P(x) for each x Px <- dbinom(x,100,0.25) # Data check: make sure Px has the correct mode # Prevent scientific notation when displaying numbers # and look at Px rounded to four decimals options( scipen= 100 ) round(Px,4) # Create vectors for the 95% exact CI LB and UB # Note, I couldn't get binconf to handle the vector, # so I ran this through a for loop. LB = rep(NA,101) for(i in 0:100){ LB[i+1] <- binconf(x=i,n=100,method="exact")[2] } UB = rep(NA,101) for(i in 0:100){ UB[i+1] <- binconf(x=i,n=100,method="exact")[3] } # Data check - look at the result cbind(LB,UB) round( cbind(x,LB,UB), 4 ) # Calculate C with a logical expression C = 0.25) # Data check - look at the result C round( cbind(x,LB,UB,C), 4 ) round( cbind(x,LB,UB,C,Px), 4 ) # Calculate E[C] sum( C*Px ) # This is a very precise calculation; however, # reporting to four decimals is plenty. round( sum( C*Px ), 4 ) ``` So for a Binom(100, 0.25), the 95% Exact Confidence Interval actually has 96.25% coverage. <b>*Q06)*</b> Let theta vary. Plot E[ C ] for a bunch of thetas ranging from 0 to 1. ```{r} # First, let's turn Q05 into a function. CoverageCalc <- function(theta){ # Function to calculate the true coverage of # the exact 95% CI for a Binom(100,theta). # Note, R functions return the last item calculated, # unless specified otherwise with return(). x <- 0:100 Px <- dbinom(x,100,theta) LB = rep(NA,101) for(i in 0:100){ LB[i+1] <- binconf(x=i,n=100,method="exact")[2] } UB = rep(NA,101) for(i in 0:100){ UB[i+1] <- binconf(x=i,n=100,method="exact")[3] } C = theta) sum( C*Px ) } # Data check - test the function at 0.25 CoverageCalc (0.25) # Second, calculate the coverage for # a bunch of thetas. N.thetas <- 100 theta <- seq(from=0,to=1,by=1/N.thetas) theta Coverage <- rep( NA, length(theta) ) for(i in 1:length(theta)){ Coverage[i] <- CoverageCalc (theta[i]) } plot( theta, Coverage, xlim = c(0,1), ylim = c(0.9,1), type = 'l', lwd = 2 ) lines( c(0,1),c(0.95,0.95) ) ``` <b>*Extension:*</b> Now compare this to the bootstrap credible interval. ```{r} # Notice the CIs are fixed for any given X # so I will run this outside the function to save CPU x <- 0:100 LB <- qbinom(0.025,100,x/100)/100 UB <- qbinom(0.975,100,x/100)/100 CoverageCalcBoot <- function(theta){ # depends on x, LB, and UB existing Px <- dbinom(x,100,theta) C = theta) sum( C*Px ) } # Data check - test the function at 0.25 CoverageCalcBoot (0.25) # Second, calculate the coverage for # a bunch of thetas. N.thetas <- 100 theta <- seq(from=0,to=1,by=1/N.thetas) theta CoverageBoot <- rep( NA, length(theta) ) for(i in 1:length(theta)){ CoverageBoot [i] <- CoverageCalcBoot (theta[i]) } plot( theta, CoverageBoot, xlim = c(0,1), ylim = c(0.9,1), type = 'l', lwd = 2 ) lines( c(0,1),c(0.95,0.95) ) # plot both on the same graph plot( theta, Coverage, xlim = c(0,1), ylim = c(0.85,1), type = 'l', lwd = 2, col='red' ) lines( c(0,1),c(0.95,0.95) ) # par new=T means plot on top of the current plot # yes, it's a completely counter-intuitive command par(new=T) plot( theta, CoverageBoot, xlim = c(0,1), ylim = c(0.85,1), type = 'l', lwd = 2, col='blue', xlab="", ylab="", axes=F ) ```

- Review the logic and performance of the exact and bootstrap confidence intervals for a proportion. Can you express these in words?
- Lay the foundation for the traditional confidence interval for a proportion, better called the asymptotic Normal confidence interval.
- Continuous distributions and the Normal distribution.
- Formal definitions of expectation and variance.
- Properties of expectation and variance, especially
**linear transformations vs. nonlinear transformations**. - Properties of sums of random variables.
- Touch upon the Central Limit Theorem.

- Rosner 4.4 and 4.5. E[X] and V[X] for a discrete distribution.
- Rosner 4.9. E[X] and V[X] for Binomial.
- Rosner 5.1 - 5.7. Continuous distributions, the Normal distribution, sums of RVs, and the Normal approximation of the Binomial.

Group A: Jonathan, Ying, Christopher, Derek.

Group B: Travis, Alice T., Jea Young, Svetlana.

Group C: Lauren, Ryan, Alice C., Andrew.

Group D: Sam, Jie, Linda, Alex.

- P( X > E[X] + 1*sqrt(V[X]) )
- P( Y > E[X] + 1*sqrt(V[X]) )
- P( X > E[X] + 2*sqrt(V[X]) )
- P( Y > E[X] + 2*sqrt(V[X]) )
- P( X > E[X] + 2.5*sqrt(V[X]) )
- P( Y > E[X] + 2.5*sqrt(V[X]) )
- P( X > E[X] + 3*sqrt(V[X]) )
- P( Y > E[X] + 3*sqrt(V[X]) )

- Review Quiz 03.
- Sums of
**independent**random variables. - Law of large numbers and central limit theorem.

- Rosner 4.4 and 4.5. E[X] and V[X] for a discrete distribution.
- Rosner 4.9. E[X] and V[X] for Binomial.
- Rosner 5.1 - 5.7. Continuous distributions, the Normal distribution, sums of RVs, and the Normal approximation of the Binomial.

# Split up X by the half-period of the cosine. # These are ordered emphasize the symmetry, which will help when I need to sum them up. # delta = the rectangle width for the numeric integration. This will let me control my level of accuracy for E[C]. delta <- 0.001 # Think about why going from -2*pi to 2*pi is sufficient for this problem. x1pos <- seq(0, pi, delta) x1neg <- seq(0, -pi, -delta) x2pos <- seq(2*pi, pi, -delta) x2neg <- seq(-2*pi, -pi, delta) # Evaluate C over this range. c1pos <- cos(x1pos) c1neg <- cos(x1neg) c2pos <- cos(x2pos) c2neg <- cos(x2neg) # Eyeball check that I've ordered them by the value of cos(x). w <- sample( 1:length(c2neg), 10, replace = F ) round( c2neg[w], 3 ) round( c1neg[w], 3 ) round( c2pos[w], 3 ) round( c1pos[w], 3 ) # Look at the whole plot of C by X par( mfrow = c(1,1) ) plot( c(x2neg,x1neg,x1pos,x2pos), c(c2neg,c1neg,c1pos,c2pos) ) # Look at the sections of C by X par( mfrow = c(2,2) ) plot( x2neg, c2neg ) plot( x1neg, c1neg ) plot( x1pos, c1pos ) plot( x2pos, c2pos )

- Quiz 04, understanding expectation and variance a little better and thinking about the precision of estimators vs. the precision of calculations.
- Law of large numbers and Central limit theorem.
- Asymptotic Normal (Wald) interval for a proportion.
- Bring it all back to operating characteristics.

# First, let's do a quick and dirty solution # to get our bearings. # Let's just take a big sample from X, calculate C, # and compute the mean and variance. set.seed(7) x <- rnorm(10^6) c <- cos(x) par(mfrow=c(1,2)) hist(x) hist(c) round( c( mean(x), var(x) ), 4 ) round( c( mean(c), var(c) ), 4 ) # Now let's see if we can get a more precise answer. # Split up X by the half-period of the cosine. # These are ordered emphasize the symmetry, which will help when I need to sum them up. # delta = the rectangle width for the numeric integration. This will let me control my level of accuracy for E[C]. delta <- 0.001 # Think about why going from -2*pi to 2*pi is sufficient for this problem. x1pos <- seq(0, pi, delta) x1neg <- seq(0, -pi, -delta) x2pos <- seq(2*pi, pi, -delta) x2neg <- seq(-2*pi, -pi, delta) # Evaluate C over this range. c1pos <- cos(x1pos) c1neg <- cos(x1neg) c2pos <- cos(x2pos) c2neg <- cos(x2neg) # Eyeball check that I've ordered them by the value of cos(x). w <- sample( 1:length(c2neg), 10, replace = F ) round( c2neg[w], 3 ) round( c1neg[w], 3 ) round( c2pos[w], 3 ) round( c1pos[w], 3 ) # Look at the whole plot of C by X par( mfrow = c(1,1) ) plot( c(x2neg,x1neg,x1pos,x2pos), c(c2neg,c1neg,c1pos,c2pos) ) # Look at the sections of C by X par( mfrow = c(2,2) ) plot( x2neg, c2neg ) plot( x1neg, c1neg ) plot( x1pos, c1pos ) plot( x2pos, c2pos ) # The pdf of C is related to the pdf of X. f1pos <- dnorm(x1pos) f1neg <- dnorm(x1neg) f2pos <- dnorm(x2pos) f2neg <- dnorm(x2neg) # Now look at the sections of f(C) by X par( mfrow = c(2,2) ) ylims <- c(0,0.4) plot( x2neg, f2neg, ylim=ylims ) plot( x1neg, f1neg, ylim=ylims ) plot( x1pos, f1pos, ylim=ylims ) plot( x2pos, f2pos, ylim=ylims ) # Here's the trick, we can sum these up # to get the pdf for f(c). Remember the # values of c are the same at each position # in the arrays c2neg, ..., c2pos. f <- f2neg + f1neg + f1pos + f2pos # We can use any of the segments to plot the pdf. par( mfrow = c(2,2) ) plot( c2neg, f ) plot( c1neg, f ) plot( c1pos, f ) plot( c2pos, f ) # Let's pick one set of c values for the # next steps. par( mfrow = c(1,1) ) plot( c2pos, f ) # The expectation is the integral of c*f(c) # over the span of c, [-1, 1]. plot( c2pos, c2pos*f ) # We can integrate numerically. Ec <- sum( c2pos*f )*delta round( Ec, 4 ) # The variance is the integral # of (c-E[c])^2*f(c) # over the span of c, [-1, 1]. plot( c2pos, (c2pos-Ec)^2*f ) # We can integrate numerically. Vc <- sum( (c2pos-Ec)^2*f )*delta round( Vc, 4 ) # Finally, let's clean this up and try # different deltas until it stabilizes # at four decimal places. EVc <- function(delta=0.001){ # Segment x by half-period x1pos <- seq(0, pi, delta) x1neg <- seq(0, -pi, -delta) x2pos <- seq(2*pi, pi, -delta) x2neg <- seq(-2*pi, -pi, delta) # Evaluate C over these ranges. c1pos <- cos(x1pos) c1neg <- cos(x1neg) c2pos <- cos(x2pos) c2neg <- cos(x2neg) # The pdf of C is related to the pdf of X. f1pos <- dnorm(x1pos) f1neg <- dnorm(x1neg) f2pos <- dnorm(x2pos) f2neg <- dnorm(x2neg) # Sum to get the pdf. f <- f2neg + f1neg + f1pos + f2pos # Estimate E[C] and V[C] Ec <- sum( c2pos*f )*delta Vc <- sum( (c2pos-Ec)^2*f )*delta cat( round( c(Ec,Vc), 4 ) ) } EVc( delta = 0.1 ) EVc( delta = 0.01 ) EVc( delta = 0.001 ) EVc( delta = 0.0001 ) EVc( delta = 0.00001 ) # These next two take a while to run, # but both yield 0.6065 0.1998 EVc( delta = 0.000001 ) EVc( delta = 0.0000001 )

- Quiz_4.pdf: Quiz 4 Solutions

- Group A: Alex, Travis, Ying, Alice C.
- Group B: Jonathan, Sam, Lauren, Svetlana
- Group C: Christopher, Andrew, Alice T., Jie
- Group D: Derek, Jea Young, Ryan, Linda

- theta_hat -- Hint: what nice property does this estimator have?
- z_alpha/2 -- Hint: what is this and what important theorem justifies its use?
- theta_hat*(1-theta_hat) -- Hint: what is this estimating?
- sqrt( theta_hat*(1-theta_hat)/n ) ) -- Hint: what is this estimating and how would you derive it from the properties of sums of random variables?

- Generic descriptions of asymptotically Normal CIs, "Exact" CIs, and bootstrap CIs.
- Applying those three approaches to data coming from other distribution/estimator dyads.
- Normal, Sample_mean with variance known (see Rosner Eq 6.6, 6.7).
- Normal, Sample_mean with variance estimated (see Rosner Eq 6.6, 6.7).
- Normal, Sample_variance (see Rosner Eq 6.15).
- Poisson, Lambda_hat (see Rosner Eq 6.23).

- Evaluating the performance of those approaches.
- Practice some basics (problems from Rosner).

- Pulmonary Disease 6.5 - 6.14 (Normal, Sample_mean with variance estimated)
- Obstetrics, Serology 6.36 - 6.39 (Normal under a transformation)
- Microbiology 6.18 - 6.22 (Normal, Sample_variance)
- Environmental Health 6.33 - 6.35 (Use Poisson to answer)

- 112_Poisson_lecture.pdf
- 113_Poisson_code.R
- There will be an overview discussion done purely on the board with lots of hand waving and jumping up and down.

Group A: Travis, Ryan, Svetlana, Linda

Group B: Derek, Ying, Jea Young, Alice C.

Group C: Sam, Lauren, Andrew, Christopher

Group D: Alex, Jonathan, Alice T.

- Confidence interval methods in the specific, e.g. the percentile bootstrap CI for the sample standard deviation.
- Confidence interval methods in the abstract, e.g. the percentile bootstrap approach.
- Examining the performance of confidence interval methods in the
*very*specific, e.g. the true coverage of the percentile bootstrap 95% CI for the sample standard deviation for X ~ N(120, sigma=10) and N = 5. - Examining the performance of confidence interval methods in the
*moderately*specific, e.g. the true coverage of the percentile bootstrap 95% CI for the sample standard deviation for X ~ N(120, sigma=10) and N varies between 3 and 200.

- Phase 1) Groups A + D discuss/collaborate and B + C discuss/collaborate
- Phase 2) Groups A + B discuss/collaborate and C + D discuss/collaborate
- Phase 3) Groups A + C discuss/collaborate and B + D discuss/collaborate

Group A: Travis, Ryan, Svetlana, Linda

Group B: Derek, Ying, Jea Young, Alice C.

Group C: Sam, Lauren, Andrew, Christopher

Group D: Alex, Jonathan, Alice T.

- Overview of confidence intervals
- Introduction to one-sample hypothesis testing

- Read Rosner Ch 7.

- On the board lecture.

- Midterm exam prep

- On the board lecture.
- Walkthrough of 2013 midterm.

- Confidence intervals for the difference between two sample means.
- One- and two-sample hypothesis testing for sample means.
- When the null hypothesis informs the variance.

- Read Rosner Chapter 8 and practice problems from chapters 7 and 8.

- 140_Confidence_Intervals.pdf
- 160_Hypothesis_Testing.pdf
- 190_When_mu_o_gives_more_than_just_the_mean.pdf

- Continuum of operating characteristics: Type I error vs. Type II error (Power) vs. Robustness vs. Persuasiveness.
- One-sample exact test for a proportion, Fisher's Exact Test.
- Thinking through picking a method to use, case study of equal variance vs. unequal variance two-sample t-test.

- Read Rosner Ch 8 and practice problems for 7 and 8. Make sure to try each method once, e.g. do at least one problem that uses Fisher's Exact Test, do at least one that uses the asymptotic Normal test for a proportion, etc. Focus repetition on the methods you find hard, not the ones you've mastered.

Group A: Andrew, Jea Young, Christopher, Alice C.

Group B: Sam, Travis, Jonathan, Lauren

Group C: Ying, Alex, Alice T., Svetlana

Group D: Derek, Ryan, Linda Q1) When choosing between the equal and unequal variance two-sample t-test, Rosner suggests performing an F test on the sample variances and using the unequal variance test only if the F test is significant (presumably at a 5% level). Other introductory texts will commonly give the rule of thumb to use the unequal variance test only if the ratio of the sample variances is outside of (1/2, 2), i.e. the larger is more than twice the size of the smaller. Which rule do you recommend, or would you recommend a different rule? This is intentionally a very open-ended question designed to let you think through what would lead you to prefer one rule over another and explore all the different factors that could impact your answer. Q2) Prepare a brief presentation (

- Continuum of operating characteristics
- Power
- Nonparametric methods
- Sign test
- Wilcoxon rank sum test
- Wilcoxon Mann Whtiney signed rank sum test

- On the board lecture of a power example for a one-sample Z-test of an anti-hypertension meds
- 250_nonparametric_tests.pdf

- Continuum of operating characteristics
- Two-sample t-test (equal vs unequal variance
- F test for the equality of two sample variances
- Application of statistical methods

- Quiz 07 presentations
- Consulting role playing; project introduction

- Read Rosner Chapter 9 with focus on:
- Sign test
- Wilcoxon rank sum test
- Wilcoxon Mann Whtiney signed rank sum test

- Categorical data

- Categorical data

- Chapters 7-10 review
- Delta Method
- Common Odds Ratios

- Paired two-sample data, continuous and dichotomous outcomes
- Omnibus testing, Chi-sq Ho revisited, comment on ANOVA
- Multiple comparisons, Bonferroni correction, alternative adjustments

- Likelihood Inference

- Practice Exam

- Bayesian Inference

Edit | Attach | Print version | History: r45 < r44 < r43 < r42 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r45 - 25 Aug 2015, RobertGreevy

Copyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback