# Plotting Model Results

## Problem:

Often times, you would like to generate graphics based on a model you fit in R. These plots may include a number of diagnostic plots, or just plotting the fitted line with prediction and confidence bands.

## "Solution":

NOTE: I use the term "Solution" loosely; in R, there is never just one solution. This is just the solution I have found.

NOTE: The code distinguished with a ">" is what you would type at the R command line. The R output, if any, and any of my comments have been commented out using a "#".

This solution is a nearly exact excerpt from specific sections in the following book:
• Introductory Statistics with R by Peter Dalgaard (Springer, 2002).

Specifically, I have excerpted paragraphs and have used and modified the R code from Chapter 5: Regression and Correlation Section 1, 2, and 3 and Chapter 10: Linear Models Section 8.

All of the data sets used in Peter Dalgaard's book can be found in the `ISwR` add-on package, which is available for download/installation on the CRAN website (http://cran.r-project.org/).

For this solution, we will be using the `thuesen` built-in data set from the `ISwR` add-on package. According to the `ISwR` description, the `thuesen` data set "contains ventricular shortening velocity and blood glucose for type 1 diabetic patients." `thuesen` contains two columns:
• blood.glucose - fasting blood glucose (mmol/l) (numeric)
• short.velocity - mean circumferential shortening velocity (%/s) (numeric)

In case some of you have problems installing the `ISwR` add-on package, I decided to generate the `thuesen` data set "from scratch" and proceed from there.

```> thuesen<-data.frame(
blood.glucose=c(15.3, 10.8, 8.1, 19.5, 7.2, 5.3, 9.3,
11.1, 7.5, 12.2, 6.7, 5.2, 19.0, 15.1, 6.7, 8.6, 4.2, 10.3,
12.5, 16.1, 13.3, 4.9, 8.8, 9.5),
short.velocity=c(1.76, 1.34, 1.27, 1.47, 1.27, 1.49, 1.31, 1.09,
1.18, 1.22, 1.25, 1.19, 1.95, 1.28, 1.52, NA, 1.12, 1.37, 1.19,
1.05, 1.32, 1.03, 1.12, 1.70))
> thuesen
#    blood.glucose short.velocity
# 1           15.3           1.76
# 2           10.8           1.34
# 3            8.1           1.27
# 4           19.5           1.47
# 5            7.2           1.27
# 6            5.3           1.49
# 7            9.3           1.31
# 8           11.1           1.09
# 9            7.5           1.18
# 10          12.2           1.22
# 11           6.7           1.25
# 12           5.2           1.19
# 13          19.0           1.95
# 14          15.1           1.28
# 15           6.7           1.52
# 16           8.6             NA
# 17           4.2           1.12
# 18          10.3           1.37
# 19          12.5           1.19
# 20          16.1           1.05
# 21          13.3           1.32
# 22           4.9           1.03
# 23           8.8           1.12
# 24           9.5           1.70
```

For those of you who are able to properly install the `ISwR` package, you can load the library and read in the `thuesen` data set as follows:
```library(ISwR)
data(thuesen)
```

This will then add the `thuesen` data set as an object to your current workspace:
```ls()
#  "last.warning" "thuesen"
```

Either way, you will end up with a data set called `thuesen`.

Let's fit a simple linear regression model using the `thuesen` data. For convenience, let's assign the value(s) returned by the `lm` function under the name `lm.velo`:
```lm.velo<-lm(short.velocity ~ blood.glucose, data=thuesen)
lm.velo
#
# Call:
# lm(formula = short.velocity ~ blood.glucose, data = thuesen)
#
# Coefficients:
#   (Intercept)  blood.glucose
#       1.09781        0.02196
```

As Peter Daalgard states on page 97: "The result of `lm` is a model object. This is a distinctive concept of the S language (of which R is a dialect). Where other statistical systems focus on generating printed output that can be controlled by setting options, you get instead the result of a model fit encapsulated in an object from which the desired quantities can be obtained using extractor functions. An `lm` object does in fact contain much more information than you see when it is printed."

A basic extractor function is the `summary` function:
```> summary(lm.velo)
#
# Call:
# lm(formula = short.velocity ~ blood.glucose, data = thuesen)
#
# Residuals:
#      Min       1Q   Median       3Q      Max
# -0.40141 -0.14760 -0.02202  0.03001  0.43490
#
# Coefficients:
#               Estimate Std. Error t value Pr(>|t|)
# (Intercept)    1.09781    0.11748   9.345 6.26e-09 ***
# blood.glucose  0.02196    0.01045   2.101   0.0479 *
# ---
# Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
#
# Residual standard error: 0.2167 on 21 degrees of freedom
# Multiple R-Squared: 0.1737,     Adjusted R-squared: 0.1343
# F-statistic: 4.414 on 1 and 21 DF,  p-value: 0.0479
```

Let's first plot just the points and the fitted line of the model fit :
```plot(thuesen\$blood.glucose, thuesen\$short.velocity,
xlab="Blood Glucose", ylab="Short Velocity",
main="Plot of Thuesen Data")
abline(lm.velo)
```

Now let's consider residuals and fitted values. Two further extractor functions are `fitted` and `resid`.

As Peter Daalgard states on page 100 - 101:
• "The fitted function returns fitted values - the y-values that you would expect for the given x-values according to the best fitting straight line", which was found by `lm`.
• "The residuals shown by the `resid` function is the difference between this and the observed *short.velocity*".
```> fitted(lm.velo)
#        1        2        3        4        5        6        7        8
# 1.433841 1.335010 1.275711 1.526084 1.255945 1.214216 1.302066 1.341599
#        9       10       11       12       13       14       15       17
# 1.262534 1.365758 1.244964 1.212020 1.515103 1.429449 1.244964 1.190057
#       18       19       20       21       22       23       24
# 1.324029 1.372346 1.451411 1.389916 1.205431 1.291085 1.306459
> resid(lm.velo)
#            1            2            3            4            5            6
#  0.326158532  0.004989882 -0.005711308 -0.056084062  0.014054962  0.275783754
#            7            8            9           10           11           12
#  0.007933665 -0.251598875 -0.082533795 -0.145757649  0.005036223 -0.022019994
#           13           14           15           17           18           19
#  0.434897199 -0.149448964  0.275036223 -0.070057471  0.045971143 -0.182346406
#           20           21           22           23           24
# -0.401411486 -0.069916424 -0.175431237 -0.171085074  0.393541161
```

Notice, observation 16 has a missing short.velocity, so no fitted value or residual is calculated. For ease of plotting, we will want to have calculated fitted values and residuals for all observations. A easy way to handle missing values is by using the `na.exclude` method for missing value (i.e. `NA`) handling. We do this using the `options` function:
```> options(na.action="na.exclude")
```

This overrides the default option of `na.action = "na.omit"`. `na.omit` removes any incomplete cases from the data set before calculating the fitted values and residuals. On the other hand, `na.exclude` inset `NA=s for cases omitted by =na.omit` in order to pad the residuals and predictions to the correct length.

In the future, you can determine the default of a particular option by using the `getOption` function in a similar fashion to this:
```# To see default na.action option:
> getOption("na.action")
#  "na.omit"
```

Peter Dalgaard discuss more ways to handle missing values on page 101.

We now have to refit `lm.velo` after changing the `na.action` option, and if we look at the fitted values, we see that observation 16 has a missing fitted value (`NA`):
```> lm.velo<-lm(short.velocity ~ blood.glucose, data=thuesen)
> fitted(lm.velo)
#        1        2        3        4        5        6        7        8
# 1.433841 1.335010 1.275711 1.526084 1.255945 1.214216 1.302066 1.341599
#        9       10       11       12       13       14       15       16
# 1.262534 1.365758 1.244964 1.212020 1.515103 1.429449 1.244964       NA
#       17       18       19       20       21       22       23       24
# 1.190057 1.324029 1.372346 1.451411 1.389916 1.205431 1.291085 1.306459
```

Now let's create a plot where residuals are displayed by connecting observations to corresponding points on the fitted line :
```> plot(thuesen\$blood.glucose, thuesen\$short.velocity,
xlab="Blood Glucose", ylab="Short Velocity",
main="Plot of Thuesen Data")
> abline(lm.velo)
> segments(x0=thuesen\$blood.glucose,
y0=fitted(lm.velo),
x1=thuesen\$blood.glucose,
y1=thuesen\$short.velocity)
```

We can also generate a simple plot of residuals versus fitted values :
```> plot(fitted(lm.velo), resid(lm.velo),
xlab="Fitted Values from lm.velo",
ylab="Residuals from lm.velo",
main="Residuals versus Fitted values from lm.velo")
```

We can also get an indication of whether the residuals might have come from a normal distribution by checking for a straight line on a %BLUR% Q-Q plot :
```> qqnorm(resid(lm.velo))
> qqline(resid(lm.velo))
```

Illustration and explanation of further regression diagnostics is given in section 10.8. A basic set of four regression diagnostic plots is available via the plot method for the `lm` objects:
1. residual versus fitted values
2. Q-Q normal distribution plot of standardized residuals
3. plot of the square root of the absolute value of the standardized residuals
4. plot of "Cook's distance", which is a measure of the influence of each observation on the regression coefficients
```> par(mfro=c(2,2))
> plot(lm.velo)
# To return to the default of one graph:
> par(mfrow=c(1,1))
```

The last thing I am going to demonstrate how to plot the fitted line of your model including the prediction and confidence bands . There is a complete explanation of the procress to produce this plot in section 5.3.

As Peter Dalgaard explains on page 106: "First we create a new data frame in which the blood.glucose variable contains the values at which we want predictions to be made":
```> pred.frame<-data.frame(blood.glucose=4:20)
```

Next we generate two vectors which "contain the result of `predict` for the new data in =pred.frame=", specifically (1) the prediction limits, and (2) the confidence limits:
```> pp<-predict(lm.velo, int="p", newdata=pred.frame)
> pc<-predict(lm.velo, int="c", newdata=pred.frame)
```

"Now for the plotting: first we create a standard scatterplot, except we ensure that it has enough room for the prediction limits", by incorporating the `ylim` argument.
```> plot(thuesen\$blood.glucose, thuesen\$short.velocity,
ylim=range(thuesen\$short.velocity, pp, na.rm=T),
xlab="Blood glucose", ylab="Short Velocity",
main="Plot with Confidence and Prediction Bands")
```

"Finally the curves are added, using as c-values the blood.glucose used for the prediction, and setting the line types and colours to more sensible values."
``` matlines(pred.frame\$blood.glucose, pc, lty=c(1,2,2), col="black")
matlines(pred.frame\$blood.glucose, pp, lty=c(1,3,3), col="black")
```

We now probably want to set the `na.action` option back to `"na.omit"` instead of ="na.exclude"=":
```options(na.action="na.omit")
```

## Acknowledgements:

I would like to thank Richard Urbano for posing this problem.
Edit | Attach | Print version |  | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r3 - 10 Nov 2006, TheresaScott

• Biostatistics Webs Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback