Bootstrapping the Proportional Hazards Check

The following R function fits a Cox proportional hazards (PH) model, checks the PH assumption (using Schoenfeld residuals and the R function cox.zph), if PH violated corrects with stratification, and then bootstraps this entire model selection process to allow computation of valid confidence intervals of the predicted survival.

This function is a work in progress, only contains the bare minimums, and may have to be altered for a specific analysis.

Any questions or comments can be sent to Bryan Shepherd, bryan.shepherd@vanderbilt.edu. I would like to thank Thomas Dupont, Terri Scott, and Frank Harrell for their programming advice (although, of course, any bugs are my fault, not theirs).

Arguments
  • formula: a character expression (given in quotations) that is converted to a formula object (to be called by coxph() ), with the response on the left of a '~' operator, and the terms on the right. The response must be a survival object as returned by the 'Surv' function.
  • number.strata: an integer of at least 2 which will determine how many strata to divide continuous predictors into if they violate the PH assumption.
  • ph.crit.val: if the GLOBAL test of PH in cox.zph is less than this value, then the model stratifies over the predictor which contributed most to this violation (as evidenced by the lowest p-value in cox.zph).
  • surv.time: the time at which to estimate the survival probability
  • surv.x: the levels of predictor variables
  • Nboots: number of bootstrap replications

Details All categorical variables need to be specified as factor variables. If stratification occurs over a factor variable, then the levels of this variable are chosen as strata. If stratification occurs over a non-factor variable, then strata are created by dividing observations into strata based on the ordered value of the predictor and the number of strata specified by the user (argument number.strata). For example, if number.strata=5, then 5 strata are used based on quintiles of the non-factor variable. The non-factor variable which was used for stratification is left in the model to account for any residual information. This function returns surv.est, the estimated survival at a given time (surv.time) for given predictor variables (surv.x), as well as surv.estb, the estimate for all Nboots bootstrap replications. A confidence interval can be computed using quantiles of surv.estb. The function also returns badcovb, which specifies which predictors were used for stratification in each bootstrap replication, 0 if no stratification. The function create.strata is used for making strata out of the non-factor predictors. In its current state, this function requires the libraries survival and Design.

This function does not compute a confidence interval for a hazard ratio. Such a confidence interval could be constructed by altering the function -- it is just hard to make this general because the hazard ratio can really only be computed if the variable of interest is pre-specified and stratification is not allowed to occur over that variable, or if specific values of the variable are specified over which to compute all hazard ratios. Also of note, in its current state, predictors must be modeled as linear functions (cannot expand with splines). This feature should change sometime soon, but it is currently above my R programming ability.

create.strata<-function(a,number){
  floor(((rank(a,ties.method="first")-1)/length(a))*number)
}
  
coxphboot<-function(formula,number.strata,ph.crit.val,surv.time,surv.x,Nboots) {

  formula<-as.formula(formula)
  mod<-coxph(formula)
  express1<-as.character(formula)
  S<-eval(parse(text=express1[2]))
  z<-predict(mod, type='terms')
  f.short<-coxph(S ~ z)
  phtest<-survival::cox.zph(f.short, transform='identity')$table 

  badcov<-0
  if(phtest[ncol(z)+1,3]<ph.crit.val) {
    badcov<-which(phtest[,3]==min(phtest[1:ncol(z),3]))
  }

  m<-Design(model.frame(formula))
  X<-m[,-1]
  classes<-attributes(m)$Design$assume.code
  Xnew<-X
  Xnew[1,]<-surv.x
  
  if (badcov>0) {
    if (classes[badcov]==5) {
      stratnum<-X[,badcov]
      tmp1<-strsplit(express1, split=" ")
      tmp1[[3]][1+(badcov-1)*2]<-"strata(stratnum)"
      express3<-formula(paste(express1[2],express1[1],paste(tmp1[[3]],collapse="")))
      mod<-coxph(express3)
    }
    if (classes[badcov]==1) {
      stratnum<-create.strata(X[,badcov],number.strata) 
      express2<-"+strata(stratnum)"
      express3<-as.formula(paste(express1[2],express1[1],express1[3],express2))     
      mod<-coxph(express3)
    }
  }

  
  if (badcov==0) {
    stuff<-summary(survival::survfit(mod,newdata=Xnew[1,]))  
    surv.est<-stuff$surv[stuff$time==max(stuff$time[stuff$time<surv.time])]
  }
  if (badcov>0) {
    if (classes[badcov]==5){
      stuff<-summary(survival::survfit(mod,newdata=Xnew[1,][-badcov]))  
      surv.strat<-surv.x[badcov]
    }
    if (classes[badcov]==1){
      stuff<-summary(survival::survfit(mod,newdata=Xnew[1,]))  
      surv.strat<-create.strata(c(X[,badcov],surv.x[badcov]),number.strata)[length(X[,badcov])+1]
    }
    surv.est<-stuff$surv[stuff$time==max(stuff$time[stuff$time<surv.time&
                                         stuff$strata==paste("stratnum=",as.character(surv.strat),sep="")])&
                         stuff$strata==paste("stratnum=",as.character(surv.strat),sep="")]
  }

  surv.estb<-badcovb<-NULL
  for (j in 1:Nboots) {
    cat("  Bootstrap: ", j, "\n")

    N<-length(X[,1])
    bootsamp<-sample(c(1:N),N,replace=TRUE)
                   
    yb<-S[,1][bootsamp]
    db<-S[,2][bootsamp]
    Sb<-Surv(yb,db)
    Xb<-X[bootsamp,1:ncol(X)]
    rownames(Xb)<-1:N
    Xb2 <- data.frame(cbind(S = Sb, Xb))

    formula1<-as.formula(paste("Sb~",express1[3]))
    modb <- coxph(formula1, data=Xb2)
    z<-predict(modb, type='terms')
    f.short<-coxph(Sb ~ z)
    phtest<-survival::cox.zph(f.short, transform='identity')$table  

    badcov<-0
    if(phtest[ncol(z)+1,3]<ph.crit.val) {
      badcov<-which(phtest[,3]==min(phtest[1:ncol(z),3]))
    }

    if (badcov>0) {
      if (classes[badcov]==5) {
        Xb2$stratnum<-Xb[,badcov]
        tmp1<-strsplit(express1, split=" ")
        tmp1[[3]][1+(badcov-1)*2]<-"strata(stratnum)"
        express3<-formula(paste("Sb~",paste(tmp1[[3]],collapse="")))
        modb<-coxph(express3, data=Xb2)
      }
      if (classes[badcov]==1) {
        Xb2$stratnum<-create.strata(Xb[,badcov],number.strata) 
        express2<-"+strata(stratnum)"
        express3<-as.formula(paste("Sb~", express1[3], express2))     
        modb<-coxph(express3, data=Xb2)
      }
    }

    if (badcov==0) {
      stuff<-summary(survival::survfit(modb,newdata=Xnew[1,]))  
      surv.estb[j]<-stuff$surv[stuff$time==max(stuff$time[stuff$time<surv.time])]
    }
    if (badcov>0) {
      if (classes[badcov]==5){
        stuff<-summary(survival::survfit(modb,newdata=Xnew[1,][-badcov]))
        surv.strat<-surv.x[badcov]
      }
      if (classes[badcov]==1){
        stuff<-summary(survival::survfit(modb,newdata=Xnew[1,]))
        surv.strat<-create.strata(c(Xb[,badcov],surv.x[badcov]),number.strata)[length(Xb[,badcov])+1]
      }
      surv.estb[j]<-stuff$surv[stuff$time==max(stuff$time[stuff$time<surv.time&
                                                          stuff$strata==paste("stratnum=",as.character(surv.strat),sep="")])&
                    stuff$strata==paste("stratnum=",as.character(surv.strat),sep="")]
    }

    tmp1<-strsplit(express1, " ")
    badcovb[j]<-badcov
    if (badcov>0) {
      badcovb[j]<-tmp1[[3]][1+(badcov-1)*2]
    }
  }
  return(surv.est,surv.estb,badcovb)
}

Example 1. I simulate data under the assumption of proportional hazards and then perform the analysis. This data generating scheme is Scenario A used in the Simulation section of my manuscript, "The Cost of Checking Proportional Hazards." (It uses the R library mvtnorm to generate multivariate normal data. This library can be installed with install.packages("mvtnorm").)

library(Design)
library(survival)
library(mvtnorm)

N<-2000

beta0<- -0.18
beta1<- -0.04
beta2<- 0.006
beta3<- 0.53
v<-0.693
lambda<-0.0312
yvar<-matrix(c(1.7943670,  7.475836, -0.7879104, 7.4758359, 43.013806, -1.2723471, -0.7879104, -1.272347, 64.7582702),3,3)
ybar<-c(3.898058, 15.022914, 38.746296) 

desmatrix <- rmvnorm(N, mean=ybar, sigma=yvar)
v1<-desmatrix[,1]
v2<-desmatrix[,2]
v3<-desmatrix[,3]
v4<-rbinom(N,1,1/(1+exp(.49+.143*v1-.00159*v2-.029*v3)))

u<-runif(N,0,1)
t<-(-log(u)/(lambda*exp(beta0*v1+beta1*v2+beta2*v3+beta3*v4)))^(1/v)
c<-rbeta(N,1.3,1)*120
y<-ifelse(t<c,t,c)
d<-ifelse(t<c,1,0)

S<-Surv(y,d)
v4<-factor(v4)

modstuff<-coxphboot(formula="S~v1+v2+v3+v4",number.strata=4,ph.crit.val=.1,surv.time=60,surv.x=c(4,15,38,1),Nboots=200)
                                        
modstuff$surv.est     ## Estimate of survival at t=60, v1=4, v2=15, v3=38, and v4=1.
quantile(modstuff$surv.estb,c(.025,.975))   ##  95% confidence interval
table(modstuff$badcovb)   ## Table of the frequency of stratification for each variable
Topic revision: r4 - 23 May 2007, BryanShepherd
 

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback