You are here: Vanderbilt Biostatistics Wiki>Main Web>HowToBootstrapCorrelatedData (23 Jul 2015, WikiGuest)EditAttach

resample <- function(dat, cluster, replace) { # exit early for trivial data if(nrow(dat) == 1 || all(replace==FALSE)) return(dat) # sample the clustering factor cls <- sample(unique(dat[[cluster[1]]]), replace=replace[1]) # subset on the sampled clustering factors sub <- lapply(cls, function(b) subset(dat, dat[[cluster[1]]]==b)) # sample lower levels of hierarchy (if any) if(length(cluster) > 1) sub <- lapply(sub, resample, cluster=cluster[-1], replace=replace[-1]) # join and return samples do.call(rbind, sub) }

**R**

code simulates a dataset with 5 correlated (rho=0.4) repeat measurements on each of 10 patients from each of 5 hospitals; 250 measurements total, 50 patients total. In these data, patients are totally independent. The functions **covimage**

and **datimage**

generate a levelplot representation of the covariance matrix and data matrix, respectively, for the simulated data.
# simulate correlated data rho <- 0.4 dat <- expand.grid( measurement=factor(1:5), patient=factor(1:10), hospital=factor(1:5)) sig <- rho * tcrossprod(model.matrix(~ 0 + patient:hospital, dat)) diag(sig) <- 1 dat$value <- chol(sig) %*% rnorm(250, 0, 1) library("lattice") covimage <- function(x) levelplot(as.matrix(x), aspect="fill", scales=list(draw=FALSE), xlab="", ylab="", colorkey=FALSE, col.regions=rev(gray.colors(100, end=1.0)), par.settings=list(axis.line=list(col=NA,lty=1,lwd=1))) datimage <- function(x) { mat <- as.data.frame(lapply(x, as.numeric)) levelplot(t(as.matrix(mat)), aspect="fill", scales=list(cex=1.2, y=list(draw=FALSE)), ylab="", xlab="", colorkey=FALSE, col.regions=gray.colors(100), par.settings=list(axis.line=list(col=NA,lty=1,lwd=1))) } datimage(dat) covimage(sig)

**R**

code generates various boostrap distributions for the sample mean, and approximates the 'true' sampling distribution by Monte Carlo. The final boxplot illustrates that the bootstrap strategy has a significant impact on the bootstrap distribution of sample statistics. Hence, it's important to think carefully about the strategy, and ensure that it most closely reflects the 'true' data generating mechanism.
# bootstrap ignoring hospital and patient levels cluster <- c("measurement") system.time(mF <- replicate(200, mean(resample(dat, cluster, c(F))$val))) system.time(mT <- replicate(200, mean(resample(dat, cluster, c(T))$val))) #boxplot(list("F" = mF, "T" = mT)) # bootstrap ignoring hospital level cluster <- c("patient","measurement") system.time(mFF <- replicate(200, mean(resample(dat, cluster, c(F,F))$val))) system.time(mTF <- replicate(200, mean(resample(dat, cluster, c(T,F))$val))) system.time(mTT <- replicate(200, mean(resample(dat, cluster, c(T,T))$val))) #boxplot(list("FF" = mFF, "TF" = mTF, "TT" = mTT)) # bootstrap accounting for full hierarchy cluster <- c("hospital","patient","measurement") system.time(mFFF <- replicate(200, mean(resample(dat, cluster, c(F,F,F))$val))) system.time(mTFF <- replicate(200, mean(resample(dat, cluster, c(T,F,F))$val))) system.time(mTTF <- replicate(200, mean(resample(dat, cluster, c(T,T,F))$val))) system.time(mTTT <- replicate(200, mean(resample(dat, cluster, c(T,T,T))$val))) #boxplot(list("FFF" = mFFF, "TFF" = mTFF, "TTF" = mTTF, "TTT" = mTTT)) # Monte Carlo for the true sampling distribution system.time(mMC <- replicate(200, mean(chol(sig) %*% rnorm(250, 0, 1)))) #boxplot(list("MC" = mMC)) boxplot(list("MC" = mMC, "F" = mF, "T" = mT, "FF" = mFF, "TF" = mTF, "TT" = mTT, "FFF" = mFFF, "TFF" = mTFF, "TTF" = mTTF, "TTT" = mTTT))

Edit | Attach | Print version | History: r35 < r34 < r33 < r32 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r35 - 23 Jul 2015, WikiGuest

Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback