Survival analysis with the survival and Design packages

The following are some examples of how to perform survival analysis using the survival and Design packages.

The survival package

Let's go through some examples using the survival package. Therefore, let's load the necessary packages using the library() function -- make sure all of the packages have already been installed. The ISwR package contains the data sets and R code for text examples and exercises in Peter Dalgaard's '_Introductory Statistics with R_' book. We need to load the ISwR package because we'll be using the melanom built-in data set in all of the examples, which we load with the data() function. We'll also load the Hmisc package, which contains some useful functions (e.g., the contents() and upData() functions) that we'll want to use.



The melanom data frame contains the following columns:
  • no: patient ID
  • status: survival status (1: dead from melanoma, 2: alive, 3: dead from other cause)
  • days: observation time in days
  • ulc: ulceration (1: present, 2: absent)
  • thick: tumor thickness (1/100 mm).
  • sex: gender (1: female, 2: male)


Let's update the melanom data frame by adding appropriate labels and units of measurements to the variables. Let's also modify the categorical variables currently coded with numeric values to be factors with character level labels. REMEMBER, order matters when defining the level labels.

melanom <- upData(melanom,
   labels=c(status="Patient's Status",
      days="Observation time",
      ulc="Tumor Ulceration",
      thick="Tumor Thickness",
   levels=list(status=c("Dead from Malignant Melanoma",
      "Alive on Jan 1, 1978", "Dead from Other Causes"),
      ulc=c("Present", "Absent"), sex=c("Female", "Male")),
   units=c(days="days", thick="1/100 mm"))

Now let's generate Kaplan-Meier (K-M) estimates from our melanom data frame. The Surv() function defines our survival outcome, based on the time-to-event and the event of interest (in this case, dead from malignant melanoma). We can generate overall K-M estimates, and estimates across gender.

# Overall estimates
surv.all <- with(melanom, survfit(Surv(days, status=="Dead from Malignant Melanoma")))
summary(surv.all, censored=TRUE)
plot(surv.all, xlab = "Days", ylab = "Survival Probability",
   main = "K-M plot for melanoma data")

# Estimates across gender
surv.bysex <- with(melanom, 
   survfit(Surv(days, status=="Dead from Malignant Melanoma") ~ sex))
summary(surv.bysex, censored=TRUE)
plot(surv.bysex,, col=c("black", "gray"), 
   xlab = "Days", ylab = "Survival Probability",
   main = "K-M plot for melanoma data, grouped by gender")
legend(250, 0, lty = 1, col = c("black", "gray"), legend = levels(melanom$gender),
   xjust = 0, yjust = 0)
Notice that tick-marks are shown on the K-M plots, which represent censored observation times. The bands give approximate confidence intervals. Also, you noticed that, by default, the different curves are not labeled.

Now let's generate a Cox proportional hazards (PH) model using the coxph() function. The strata() identifies stratification variables in the model. In this example, we stratify across the levels of tumor ulceration (ulc).

coxph.m1 <- coxph(Surv(days, status=="Dead from Malignant Melanoma") ~ 
   log(thick) + sex + strata(ulc), data=melanom)
plot(survfit(coxph.m1),, col=c("black", "gray"), 
   xlab = "Days", ylab = "Survival Probability",
   main = paste("Baseline suvival curves (ulcerated and nonulcerated tumors)",
      "in stratified Cox regression", sep = "\n"))
legend(100, 0, lty = 1, col = c("black", "gray"), legend = rev(levels(melanom$ulc)),
   xjust = 0, yjust = 0)

The Design package

BE AWARE, when loaded, the Design package overwrites the Surv() and survfit() (and cox.zph()) functions from the survival package.


In turn, we build our expression to calculate our K-M in the same way as we did with the survival package. We do have to modify where we place the with() function, though.

# Overall estimates
surv.all <- survfit(with(melanom, Surv(days, status=="Dead from Malignant Melanoma")))
# Estimates across gender
surv.bysex <- survfit(with(melanom, 
   Surv(days, status=="Dead from Malignant Melanoma") ~ sex))

The main difference between the Design and survival packages for K-M estimates is the function you use to plot them. With the survival package, you are actually using the class-specific plot.survfit() function. On the other hand, with the Design package, you use the survplot() function, which actually calls the class-specific survplot.survfit() function when you plot a survfit() object. By default, confidence intervals are displayed using bars (i.e., conf = "bars"), but they can be modified to the bands we saw with the plot.survfit() function by specifying conf = "bands", and they can be suppressed by specifying conf = "none". Unlike the output of a plot.survfit() funciton invocation, the survplot.survfit() function does not display censored times using tick-marks, but it does automatically label the individual survival curves if the survfit() object contains a stratifying variable.

survplot(surv.all, conf = "bands",
   xlab = "Days", ylab = "Survival Probability")
title(main = "K-M plot for melanoma data")

survplot(surv.bysex, conf = "none",
   xlab = "Days", ylab = "Survival Probability")
title(main = "K-M plot for melanoma data, grouped by gender")
Two other useful arguments of the survplot.survfit() function to know about are the n.risk= and arguments. Specify n.risk = TRUE to add the number of subjects at risk for each curve to the plot. The argument allows you to modify the time increment used for labeling the x-axis.

Now let's generate a Cox PH model using the Design package's cph() function. Notice, that we use the strat() function instead of the strata() function to define the stratifying variable. We alsp specify = 2*365.25, surv = TRUE, and y = TRUE in order to specify that the survival estimates should be displayed every two years, to (invisibly) return the calculated survival estimates.

f <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + 
   strat(ulc), data=melanom,*365.25, surv=TRUE)

Unlike the survival package, in order to graphically display the results of the Cox PH model that has been fit using the Design package's cph() function, we have to calculate and define the '_data distribution_' of the variables in the melanom data frame. The Design package's model plotting functions use these defined '_data distrutions_' (i.e., variable labels, units of measurement, summary statistics, and/or level labels) to automatically define things such as plotting ranges and labels.


Now we can plot the results of our Cox regression. As before, we use the generic survplot() function, but this time we will be actually using the class-specific survplot.Design() function. With the survplot.Design() function, we specify that we want to produce two survival curves, one for each level of tumor ulceration, using ulc = NA.

survplot(f, ulc=NA)

# A lot more fancy graph
survplot(f, ulc=NA, ylim=c(0.5, 1.0), 
   xlab="Observation Time (days)",
   ylab="Cumulative Proportion Without Death from Malignant Melanoma",
   n.risk=TRUE, cex.n.risk=0.85, adj.n.risk=0.5)

We can also generate a plot of the calculated hazard ratios with 95% confidence intervals from our Cox regression model, using the class-specific plot.summary.Design(). In order to do this, we need to redefine our model by removing the strat() function from the tumor ulceration. Like before, we need to make sure that the 'data distribution' of our data frame has been calculated and defined.

op <- par(read.only = TRUE)

f1 <- cph(Surv(days, status=="Dead from Malignant Melanoma") ~ log(thick) + sex + ulc,
par(mar=c(5, 4, 4, 2)+0.1)
plot(summary(f1), q = 0.95, col = "gray")

Topic revision: r2 - 10 Nov 2006, TheresaScott

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback