You are here: Vanderbilt Biostatistics Wiki>Main Web>StatComp>SasMacros (31 Jan 2004, FrankHarrell)EditAttach

`RCSPLINE`

macro can be used. This macro generates SAS formulas for
the constructed variables for k=3-10. In the following example, new
variables age1, age2, and age3 are defined using knots at age=20, 30,
40, 50, and 60 years.
DATA; INPUT age sex response; . . . %RCSPLINE(age,20,30,40,50,60) *At this point age1-age3 exist; PROC REG; MODEL y=age age1-age3;If the analyst wants to have SAS automatically compute the knots based on fixed percentiles of the predictor's distribution, the

`DASPLINE`

macro can be used. This macro invokes the `UNIVARIATE`

procedure to
compute percentiles and then creates a symbolic formula for the
constructed spline variables which can be used in many `DATA`

steps. Spline variables for multiple predictors can be generated
simultaneously and up to k=7 knots can be used. `DASPLINE`

for SAS
Version 6 can use the default quantiles shown in Harrell's
book. `DASPLINE`

macro (Harrell): automatic knot selection using
`PROC UNIVARIATE`

.
%DASPLINE(age bp) *Use DASPLINE(age bp,NK=5) to use 5 knots instead of the default of 4; DATA ; SET ; . . . . &_age *_age is a macro variable containing formulas for age1, age2; &_bp *_bp likewise contains formulas for bp1, bp2; PROC GLM; MODEL y=age age1-age2 bp bp1-bp2;There is another SAS macro called

`PSPLINET`

(Plot Spline
Transformation) for plotting the restricted cubic spline
transformation for a single predictor in binary and ordinal logistic
models and Cox proportional hazards models, with 95% confidence
bands. The fit can adjust for other variables that are assumed to be
linear or transformed correctly. `PSPLINET`

uses `DASPLINE`

to
automatically compute knots and derive spline component variables, the
`LOGISTIC`

or `PHREG`

procedures to fit the model, a macro called `EMPTREND`

to optionally add subgroup estimates to the graph, and SAS procedures
`PLOT`

or `GPLOT`

to make the graph. A typical call to `PSPLINET`

would be
`PSPLINET`

macro (Harrell): uses `DASPLINE, LOGISTIC, PHREG, PLOT`

,
`GPLOT`

to fit and display spline functions and test for linearity:
%PSPLINET(x, y, RANGE=0 TO 200, ADJ=age sex, TESTLIN=1, MODEL=cox or logistic (default) )For the Cox model (

`PROC PHREG`

), SAS macros `SRVTREND`

and `PLOTHR`

provide many types of displays of estimated survival probabilities and
log hazards. Some SAS procedures (e.g., `GLM`

, `LIFEREG`

) allow a `CLASS`

statement to be used to specify that dummy variables are to be
generated from the list of unique values of a variable. These
procedures connect these dummy variables to generated pooled F and
Wald tests. `PROC LIFEREG`

only allows main effects for `CLASS`

variables. Some SAS procedures (e.g., `GLM, REG, PHREG`

) allow a `TEST`

statement to get pooled F or Wald tests. For example, `PROC REG`

can be
used to test the linearity of spline-expanded age as well as the
overall importance of age: `GLM, LIFEREG`

allow `CLASS`

variables (`LIFEREG`

-- main effects only) `GLM, REG, PHREG`

allow `TEST`

statement for pooled
tests:
PROC REG; MODEL y=age age1-age3 sex; agelin: TEST age1,age2,age3; agetot: TEST age,age1,age2,age3;The

`REG`

procedure also allows grouping of variables during stepwise
variable selection. The following program will consider all age
variables together, and will only allow selection of all or none of
them. `REG`

allows grouping of var. during variable selection:
PROC REG; MODEL y={age age1-age3} sex / SELECTION=adjrsq AIC GROUPNAMES="age" "sex";With the exception of

`PROC PHREG`

, predicted values are obtained in
SAS by adding new observations to the dataset used in fitting the
model. For these new observations, the response variable is not
defined (resulting in missing values). The missing response values
cause the new observations to be ignored in the fit of the model. Note
that all derived variables (e.g., spline components, squares,
interaction terms) must be recomputed in the inner loop when creating
the new points.
DATA main; . . . %DASPLINE(age) *Omit to use RCSPLINE; DATA main; SET main; &_age *Compute age1, age2 if using DASPLINE; sexage = sex*age; sexage1= sex*age1; sexage2= sex*age2; DATA est; *Note y is unspecified; DO sex=0,1; DO age=20 TO 80 BY 2; &_age *Re-evaluate nonlinear terms; *Could have used RCSPLINE(age,20,50,etc.); sexage = sex*age; sexage1= sex*age1; sexage2= sex*age2; OUTPUT; END; END; DATA both; SET main est; PROC REG; MODEL y=age age1-age2 sex sexage sexage1 sexage2; OUTPUT OUT=est P=yhat R=residual; DATA est; SET est; IF y=.; PROC PLOT; PLOT yhat*age=sex;Linear splines can be fitted in SAS using the following example.

DATA; age1=max(age-30,0); age2=max(age-70,0); *Note: age1, age2 will never be missing, but missing age will cause the observations to be deleted; PROC REG; MODEL y=age age1 age2; Linear: TEST age1, age2;

`VARCLUS`

and `PRINQUAL`

procedures are
invaluable for data reduction and scaling. The user may wish to use
SAS's `CORR`

procedure to produce a matrix of Spearman rank correlation
coefficients (and then square them) or Hoeffding D statistics using
pairwise deletion, and use this matrix as input to `VARCLUS`

. One
disadvantage of `PRINQUAL`

is that if a transformation is allowed to be
non--monotonic, the ordinary cubic splines without linear tail
restrictions can result in illogical fits in the tails. There are also
excellent SAS procedures `CORRESP`

for correspondence analysis principal
components analysis (`PRINCOMP`

) cluster analysis (`CLUSTER`

), and
canonical correlation (`CANCORR`

). The `TRANSREG`

procedure can also be
useful in estimating transformations. As a special case of `MGV`

, one
can use `TRANSREG`

to find the optimum transformation of `X1`

that is most
highly correlated with a linear combination of the optimum
transformations of `X2, X3, ..., Xp`

. Like `PRINQUAL`

, `TRANSREG`

does not
support restricted cubic splines. A recent addition to SAS is the
`GENMOD`

procedure for generalized linear models. `GENMOD`

implements a
general family of distributions/link functions as does S's `glm`

function. `GENMOD`

allows a `CLASS`

statement to automatically generate
dummy variables, and it allows any order interactions. It does not
provide multiple d.f. tests for variables without the user manually
providing a `CONTRAST`

statement. -
`VARCLUS`

: variable clustering -
`PRINCOMP`

: principal components -
`PRINQUAL`

: qualitative principal components -
`TRANSREG`

: simultaneous transformations of X and Y -
`CORRESP`

: correspondence analysis -
`CLUSTER`

: cluster analysis -
`CANCORR`

: canonical correlation -
`REG`

: ols with many options, diagnostics -
`GLM`

: ols with`CLASS`

variables and more -
`NLIN`

: nonlinear regression -
`LOGISTIC`

: binary and ordinal logistic regression -
`LIFEREG`

: parametric survival models -
`PHREG`

: Cox regression -
`GENMOD`

: generalized linear models

I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|

txt | survrisk.txt | manage | 113.3 K | 24 Jan 2004 - 07:34 | WikiGuest |

Edit | Attach | Print version | History: r3 < r2 < r1 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r3 - 31 Jan 2004, FrankHarrell

Copyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback