Generating multiple regression models in a for loop

Often, we wish to generate multiple regression models that are all similar, but all different. For example, all of the models have the same outcome and main covariate, but each has a different second covariate.

We could assign each individual model, copying and pasting the model expression and appropriately change the name of the second covariate, but this could take a lot of time and typing.

Instead, consider assigning the models using a for loop, where the loop iterates over the values of the second covariate.

For this example, let's use our old standby -- the samplefile.txt data file.

Let's first load the necessary packages, read in our data file, and make some changes to our data frame -- add some variable labels, level labels, and units.

library(Hmisc)
library(Design)

x<-read.table("samplefile.txt", header=TRUE)
x<-upData(x,
   labels=c(age="Age", race="Race", sex="Sex",
      weight="Weight", visits="No. of Visits",
      tx="Treatment"),
   levels=list(sex=c("Female", "Male"),
      race=c("Black", "Caucasian", "Other"),
      tx=c("Drug X", "Placebo")),
   units=c(age="years", weight="lbs."))
contents(x)

Now let's create a binary outcome based on the number of visits --- we're interested in what predicts a patient will have more than 10 visits.

x$gt10visits <- factor(ifelse(x$visits <= 10, 0, 1),
   labels = c("<= 10 visits", "> 10 visits"))
describe(x$gt10visits)

In this example, each model will have the same basic form: lrm(gt10visits ~ tx + covariate2, data = x), where covariate2 can have the value age, race, sex, and weight. We would also like to keep the model separate by incorporating the value of covariate2 into the name.

We can do this using the following code:

varnames <- Cs(age, race, sex, weight)
for(i in varnames) {
   modelformula <- paste("gt10visits ~ tx +", i)
   assign(x = paste("m", i, sep = "."), value = lrm(as.formula(modelformula), data = x))
}
In this code, we supply the name of the second covariate as a text string (e.g., "age"). Because of this, we can use the paste() function to past together our formula, which is in the form of a character string. Unfortunately, we can not supply this character representation into the Design package's lrm() function; it won't recognize it. Instead, we use the as.formula() function to coerce the pasted together formula character string (e.g., "gt10visits ~ tx + age") to a formula class, which the lrm() function will recognize.

Another problem with character strings arises when we try to assign the resulting model to a name. Normally, we assign a name to an object using the assignment operator <-. For example, m1 <- lrm(gt10visits ~ tx + age, data = x). However, in this example, we want to incorporate the value of the second covariate into the name of the object (e.g., m.age). We can generate these unique model names using paste("m", i, sep = "."), but if you tried to evaluate the following code, you would get an error: paste("m", i, sep = ".")  <- lrm(as.formula(modelformula), data = x)

The reason you would get an error is because the assignment operator <- will not accept a character string on its left hand side. An alternative is to use the assign() function, which the assignment operator is a shortcut for. With the assign() function, the name of the object is given as a character string.

Now that we've figured out how to correctly generate and assign each of the models, let's think ahead to what we might want to do with them. Most likely, we'll want to generate a summary and/or plot of each model. Based on the for loop we used above, we now have N defined lrm objects (model fits), where N is the number of variables in varnames. We can see this using the ls() function. So, if we want to manipulate all of the defined lrm objects, we'll need to reference them all, most likely, in another for loop.

An alternative would be to generate an object that contains each of the model fits. In R, the easiest way to do this would be to generate a list, where each component of the list would be one of the fitted models --- that is, we would have a list of lists (the result of the lrm function is a list). Having a list of lists would allow us to use one the apply functions to generate a summary and/or plot of each component of the list (i.e., each fitted model).

To do this, we can use the following code as an alternative to the code above:

  
varnames <- Cs(age, race, sex, weight)
modelfits <- vector(length(varnames), mode = "list")
names(modelfits) <- varnames
for(i in varnames) {
   modelformula <- paste("gt10visits ~ tx +", i)
   modelfits[[i]] <- lrm(as.formula(modelformula), data = x)
}
In this code, we generate a list with the correct number of components with the vector() function. We then name each component of the list, so we can assign the corect model fit to each slot in the for loop. We can use the names() function and str() (structure) function to look at what modelfits contains --- BE AWARE, the str() function returns a lot of output.

  
names(modelfits)
str(modelfits[["age"]])
str(modelfits)
Topic revision: r1 - 16 Feb 2007, TheresaScott
 

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback