SAS Macros for Assisting with Survival and Risk Analysis, and Some SAS Procedures Useful for Multivariable Modeling
Several unsupported SAS macros written by Harrell that are helpful for survival
analysis and logistic regression are available
here. Two of these
macros generate constructed restricted cubic spline variables for use
in any regression procedure. If the analyst has a good idea about the
distribution of a predictor so that knots can be chosen in advance,
the
RCSPLINE macro can be used. This macro generates SAS formulas for
the constructed variables for k=3-10. In the following example, new
variables age1, age2, and age3 are defined using knots at age=20, 30,
40, 50, and 60 years.
DATA;
INPUT age sex response; . . .
%RCSPLINE(age,20,30,40,50,60) *At this point age1-age3 exist;
PROC REG;
MODEL y=age age1-age3;
If the analyst wants to have SAS automatically compute the knots based
on fixed percentiles of the predictor's distribution, the
DASPLINE
macro can be used. This macro invokes the
UNIVARIATE procedure to
compute percentiles and then creates a symbolic formula for the
constructed spline variables which can be used in many
DATA
steps. Spline variables for multiple predictors can be generated
simultaneously and up to k=7 knots can be used.
DASPLINE for SAS
Version 6 can use the default quantiles shown in Harrell's
book.
DASPLINE macro (Harrell): automatic knot selection using
PROC UNIVARIATE.
%DASPLINE(age bp) *Use DASPLINE(age bp,NK=5) to use 5 knots
instead of the default of 4;
DATA ; SET ; . . . .
&_age *_age is a macro variable containing formulas for age1, age2;
&_bp *_bp likewise contains formulas for bp1, bp2;
PROC GLM;
MODEL y=age age1-age2 bp bp1-bp2;
There is another SAS macro called
PSPLINET (Plot Spline
Transformation) for plotting the restricted cubic spline
transformation for a single predictor in binary and ordinal logistic
models and Cox proportional hazards models, with 95% confidence
bands. The fit can adjust for other variables that are assumed to be
linear or transformed correctly.
PSPLINET uses
DASPLINE to
automatically compute knots and derive spline component variables, the
LOGISTIC or
PHREG procedures to fit the model, a macro called
EMPTREND
to optionally add subgroup estimates to the graph, and SAS procedures
PLOT or
GPLOT to make the graph. A typical call to
PSPLINET would be
PSPLINET macro (Harrell): uses
DASPLINE, LOGISTIC, PHREG, PLOT,
GPLOT to fit and display spline functions and test for linearity:
%PSPLINET(x, y, RANGE=0 TO 200, ADJ=age sex, TESTLIN=1,
MODEL=cox or logistic (default) )
For the Cox model (
PROC PHREG), SAS macros
SRVTREND and
PLOTHR
provide many types of displays of estimated survival probabilities and
log hazards. Some SAS procedures (e.g.,
GLM,
LIFEREG) allow a
CLASS
statement to be used to specify that dummy variables are to be
generated from the list of unique values of a variable. These
procedures connect these dummy variables to generated pooled F and
Wald tests.
PROC LIFEREG only allows main effects for
CLASS
variables. Some SAS procedures (e.g.,
GLM, REG, PHREG) allow a
TEST
statement to get pooled F or Wald tests. For example,
PROC REG can be
used to test the linearity of spline-expanded age as well as the
overall importance of age:
GLM, LIFEREG allow
CLASS variables (
LIFEREG
-- main effects only)
GLM, REG, PHREG allow
TEST statement for pooled
tests:
PROC REG;
MODEL y=age age1-age3 sex;
agelin: TEST age1,age2,age3;
agetot: TEST age,age1,age2,age3;
The
REG procedure also allows grouping of variables during stepwise
variable selection. The following program will consider all age
variables together, and will only allow selection of all or none of
them.
REG allows grouping of var. during variable selection:
PROC REG;
MODEL y={age age1-age3} sex / SELECTION=adjrsq AIC
GROUPNAMES="age" "sex";
With the exception of
PROC PHREG, predicted values are obtained in
SAS by adding new observations to the dataset used in fitting the
model. For these new observations, the response variable is not
defined (resulting in missing values). The missing response values
cause the new observations to be ignored in the fit of the model. Note
that all derived variables (e.g., spline components, squares,
interaction terms) must be recomputed in the inner loop when creating
the new points.
DATA main; . . .
%DASPLINE(age) *Omit to use RCSPLINE;
DATA main; SET main;
&_age *Compute age1, age2 if using DASPLINE;
sexage = sex*age;
sexage1= sex*age1;
sexage2= sex*age2;
DATA est; *Note y is unspecified;
DO sex=0,1;
DO age=20 TO 80 BY 2;
&_age *Re-evaluate nonlinear terms;
*Could have used RCSPLINE(age,20,50,etc.);
sexage = sex*age;
sexage1= sex*age1;
sexage2= sex*age2;
OUTPUT;
END;
END;
DATA both; SET main est;
PROC REG;
MODEL y=age age1-age2 sex sexage sexage1 sexage2;
OUTPUT OUT=est P=yhat R=residual;
DATA est; SET est; IF y=.;
PROC PLOT;
PLOT yhat*age=sex;
Linear splines can be fitted in SAS using the following example.
DATA;
age1=max(age-30,0); age2=max(age-70,0);
*Note: age1, age2 will never be missing, but missing age will
cause the observations to be deleted;
PROC REG;
MODEL y=age age1 age2;
Linear: TEST age1, age2;
Some SAS Procedures Useful in Multivariable Modeling
The
VARCLUS and
PRINQUAL procedures are
invaluable for data reduction and scaling. The user may wish to use
SAS's
CORR procedure to produce a matrix of Spearman rank correlation
coefficients (and then square them) or Hoeffding D statistics using
pairwise deletion, and use this matrix as input to
VARCLUS. One
disadvantage of
PRINQUAL is that if a transformation is allowed to be
non--monotonic, the ordinary cubic splines without linear tail
restrictions can result in illogical fits in the tails. There are also
excellent SAS procedures
CORRESP for correspondence analysis principal
components analysis (
PRINCOMP) cluster analysis (
CLUSTER), and
canonical correlation (
CANCORR). The
TRANSREG procedure can also be
useful in estimating transformations. As a special case of
MGV, one
can use
TRANSREG to find the optimum transformation of
X1 that is most
highly correlated with a linear combination of the optimum
transformations of
X2, X3, ..., Xp. Like
PRINQUAL,
TRANSREG does not
support restricted cubic splines. A recent addition to SAS is the
GENMOD procedure for generalized linear models.
GENMOD implements a
general family of distributions/link functions as does S's
glm
function.
GENMOD allows a
CLASS statement to automatically generate
dummy variables, and it allows any order interactions. It does not
provide multiple d.f. tests for variables without the user manually
providing a
CONTRAST statement.
-
VARCLUS: variable clustering
-
PRINCOMP: principal components
-
PRINQUAL: qualitative principal components
-
TRANSREG: simultaneous transformations of X and Y
-
CORRESP: correspondence analysis
-
CLUSTER: cluster analysis
-
CANCORR: canonical correlation
-
REG: ols with many options, diagnostics
-
GLM: ols with CLASS variables and more
-
NLIN: nonlinear regression
-
LOGISTIC: binary and ordinal logistic regression
-
LIFEREG: parametric survival models
-
PHREG: Cox regression
-
GENMOD: generalized linear models
Last Modified:21 Sep 1999
--
FrankHarrell - 23 Jan 2004