SAS Macros for Assisting with Survival and Risk Analysis, and Some SAS Procedures Useful for Multivariable Modeling

Several unsupported SAS macros written by Harrell that are helpful for survival analysis and logistic regression are available here. Two of these macros generate constructed restricted cubic spline variables for use in any regression procedure. If the analyst has a good idea about the distribution of a predictor so that knots can be chosen in advance, the RCSPLINE macro can be used. This macro generates SAS formulas for the constructed variables for k=3-10. In the following example, new variables age1, age2, and age3 are defined using knots at age=20, 30, 40, 50, and 60 years.
    DATA;
    INPUT age sex response; . . .
    %RCSPLINE(age,20,30,40,50,60) *At this point age1-age3 exist;
    PROC REG;
       MODEL y=age age1-age3;
If the analyst wants to have SAS automatically compute the knots based on fixed percentiles of the predictor's distribution, the DASPLINE macro can be used. This macro invokes the UNIVARIATE procedure to compute percentiles and then creates a symbolic formula for the constructed spline variables which can be used in many DATA steps. Spline variables for multiple predictors can be generated simultaneously and up to k=7 knots can be used. DASPLINE for SAS Version 6 can use the default quantiles shown in Harrell's book. DASPLINE macro (Harrell): automatic knot selection using PROC UNIVARIATE.
    %DASPLINE(age bp)    *Use DASPLINE(age bp,NK=5) to use 5 knots
                          instead of the default of 4;
    DATA ; SET ; . . . .
    &_age    *_age is a macro variable containing formulas for age1, age2;
    &_bp     *_bp likewise contains formulas for bp1, bp2;
    PROC GLM;
       MODEL y=age age1-age2 bp bp1-bp2;
There is another SAS macro called PSPLINET (Plot Spline Transformation) for plotting the restricted cubic spline transformation for a single predictor in binary and ordinal logistic models and Cox proportional hazards models, with 95% confidence bands. The fit can adjust for other variables that are assumed to be linear or transformed correctly. PSPLINET uses DASPLINE to automatically compute knots and derive spline component variables, the LOGISTIC or PHREG procedures to fit the model, a macro called EMPTREND to optionally add subgroup estimates to the graph, and SAS procedures PLOT or GPLOT to make the graph. A typical call to PSPLINET would be PSPLINET macro (Harrell): uses DASPLINE, LOGISTIC, PHREG, PLOT, GPLOT to fit and display spline functions and test for linearity:
    %PSPLINET(x, y, RANGE=0 TO 200, ADJ=age sex, TESTLIN=1,
              MODEL=cox or logistic (default)   )
For the Cox model (PROC PHREG), SAS macros SRVTREND and PLOTHR provide many types of displays of estimated survival probabilities and log hazards. Some SAS procedures (e.g., GLM, LIFEREG) allow a CLASS statement to be used to specify that dummy variables are to be generated from the list of unique values of a variable. These procedures connect these dummy variables to generated pooled F and Wald tests. PROC LIFEREG only allows main effects for CLASS variables. Some SAS procedures (e.g., GLM, REG, PHREG) allow a TEST statement to get pooled F or Wald tests. For example, PROC REG can be used to test the linearity of spline-expanded age as well as the overall importance of age: GLM, LIFEREG allow CLASS variables (LIFEREG -- main effects only) GLM, REG, PHREG allow TEST statement for pooled tests:
    PROC REG;
       MODEL y=age age1-age3 sex;
       agelin: TEST age1,age2,age3;
       agetot: TEST age,age1,age2,age3;
The REG procedure also allows grouping of variables during stepwise variable selection. The following program will consider all age variables together, and will only allow selection of all or none of them. REG allows grouping of var. during variable selection:
    PROC REG;
       MODEL y={age age1-age3} sex / SELECTION=adjrsq   AIC
             GROUPNAMES="age" "sex";
With the exception of PROC PHREG, predicted values are obtained in SAS by adding new observations to the dataset used in fitting the model. For these new observations, the response variable is not defined (resulting in missing values). The missing response values cause the new observations to be ignored in the fit of the model. Note that all derived variables (e.g., spline components, squares, interaction terms) must be recomputed in the inner loop when creating the new points.
    DATA main; . . .
    %DASPLINE(age)           *Omit to use RCSPLINE;
    DATA main; SET main;
       &_age                 *Compute age1, age2 if using DASPLINE;
       sexage = sex*age;
       sexage1= sex*age1;
       sexage2= sex*age2;
    DATA est;                *Note y is unspecified;
       DO sex=0,1;
          DO age=20 TO 80 BY 2;
          &_age              *Re-evaluate nonlinear terms;
                             *Could have used RCSPLINE(age,20,50,etc.);
          sexage = sex*age;
          sexage1= sex*age1;
          sexage2= sex*age2;
          OUTPUT;
          END;
       END;
    DATA both; SET main est;
    PROC REG;
       MODEL y=age age1-age2 sex sexage sexage1 sexage2;
       OUTPUT OUT=est P=yhat R=residual;
    DATA est; SET est; IF y=.;
    PROC PLOT;
       PLOT yhat*age=sex;
Linear splines can be fitted in SAS using the following example.
    DATA;
       age1=max(age-30,0);  age2=max(age-70,0);
      *Note: age1, age2 will never be missing, but missing age will
       cause the observations to be deleted;
    PROC REG;
       MODEL y=age age1 age2;
       Linear: TEST age1, age2;

Some SAS Procedures Useful in Multivariable Modeling

The VARCLUS and PRINQUAL procedures are invaluable for data reduction and scaling. The user may wish to use SAS's CORR procedure to produce a matrix of Spearman rank correlation coefficients (and then square them) or Hoeffding D statistics using pairwise deletion, and use this matrix as input to VARCLUS. One disadvantage of PRINQUAL is that if a transformation is allowed to be non--monotonic, the ordinary cubic splines without linear tail restrictions can result in illogical fits in the tails. There are also excellent SAS procedures CORRESP for correspondence analysis principal components analysis (PRINCOMP) cluster analysis (CLUSTER), and canonical correlation (CANCORR). The TRANSREG procedure can also be useful in estimating transformations. As a special case of MGV, one can use TRANSREG to find the optimum transformation of X1 that is most highly correlated with a linear combination of the optimum transformations of X2, X3, ..., Xp. Like PRINQUAL, TRANSREG does not support restricted cubic splines. A recent addition to SAS is the GENMOD procedure for generalized linear models. GENMOD implements a general family of distributions/link functions as does S's glm function. GENMOD allows a CLASS statement to automatically generate dummy variables, and it allows any order interactions. It does not provide multiple d.f. tests for variables without the user manually providing a CONTRAST statement.
  • VARCLUS: variable clustering
  • PRINCOMP: principal components
  • PRINQUAL: qualitative principal components
  • TRANSREG: simultaneous transformations of X and Y
  • CORRESP: correspondence analysis
  • CLUSTER: cluster analysis
  • CANCORR: canonical correlation
  • REG: ols with many options, diagnostics
  • GLM: ols with CLASS variables and more
  • NLIN: nonlinear regression
  • LOGISTIC: binary and ordinal logistic regression
  • LIFEREG: parametric survival models
  • PHREG: Cox regression
  • GENMOD: generalized linear models

Last Modified:21 Sep 1999

-- FrankHarrell - 23 Jan 2004
Topic revision: r3 - 31 Jan 2004, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback