# Plotting linear model results

Often times, you would like to generate graphics based on a model you fit in R. These plots may include a number of diagnostic plots, or just plotting the fitted line with prediction and confidence bands.

This solution is a nearly exact excerpt from specific sections in the following book:
• Introductory Statistics with R by Peter Dalgaard (Springer, 2002).

Specifically, I have excerpted paragraphs and have used and modified the R code from Chapter 5: Regression and Correlation Section 1, 2, and 3 and Chapter 10: Linear Models Section 8.

All of the data sets used in Peter Dalgaard's book can be found in the `ISwR` add-on package, which is available for download/installation on the CRAN website (http://cran.r-project.org/).

For this solution, we will be using the `thuesen` built-in data set from the `ISwR` add-on package. According to the `ISwR` description, the `thuesen` data set "contains ventricular shortening velocity and blood glucose for type 1 diabetic patients." `thuesen` contains two columns:
• `blood.glucose` - fasting blood glucose (mmol/l) (numeric)
• `short.velocity` - mean circumferential shortening velocity (%/s) (numeric)

In case some of you have problems installing the `ISwR` add-on package, I decided to generate the `thuesen` data set "from scratch" and proceed from there.

```thuesen<-data.frame(
blood.glucose=c(15.3, 10.8, 8.1, 19.5, 7.2, 5.3, 9.3,
11.1, 7.5, 12.2, 6.7, 5.2, 19.0, 15.1, 6.7, 8.6, 4.2, 10.3,
12.5, 16.1, 13.3, 4.9, 8.8, 9.5),
short.velocity=c(1.76, 1.34, 1.27, 1.47, 1.27, 1.49, 1.31, 1.09,
1.18, 1.22, 1.25, 1.19, 1.95, 1.28, 1.52, NA, 1.12, 1.37, 1.19,
1.05, 1.32, 1.03, 1.12, 1.70))
```

For those of you who are able to properly install the `ISwR` package, you can load the library and read in the `thuesen` data set as follows:

```library(ISwR)
data(thuesen)
```

Let's fit a simple linear regression model using the `thuesen` data. For convenience, let's assign the value(s) returned by the `lm` function under the name `lm.velo`:

```lm.velo <- lm(short.velocity ~ blood.glucose, data = thuesen)
lm.velo
```

As Peter Daalgard states on page 97: "The result of `lm` is a model object. This is a distinctive concept of the S language (of which R is a dialect). Where other statistical systems focus on generating printed output that can be controlled by setting options, you get instead the result of a model fit encapsulated in an object from which the desired quantities can be obtained using extractor functions. An `lm` object does in fact contain much more information than you see when it is printed."

A basic extractor function is the `summary()` function:

```summary(lm.velo)
```

Let's first plot just the points and the fitted line of the model fit :

```with(thuesen, plot(blood.glucose, short.velocity,
xlab="Blood Glucose", ylab="Short Velocity",
main="Plot of Thuesen Data"))
abline(lm.velo)
```

Now let's consider residuals and fitted values. Two further extractor functions are the `fitted()` and `resid()` functions.

As Peter Daalgard states on page 100 - 101:
• "The fitted function returns fitted values - the y-values that you would expect for the given x-values according to the best fitting straight line", which was found by `lm`.
• "The residuals shown by the `resid` function is the difference between this and the observed *short.velocity*".

```fitted(lm.velo)
resid(lm.velo)
```

Notice, observation 16 has a missing short.velocity, so no fitted value or residual is calculated. For ease of plotting, we will want to have calculated fitted values and residuals for all observations. A easy way to handle missing values is by using the `na.exclude` method for missing value (i.e. `NA`) handling. We do this using the `options` function:

```options(na.action="na.exclude")
```

This overrides the default option of `na.action = "na.omit"`. `na.omit` removes any incomplete cases from the data set before calculating the fitted values and residuals. On the other hand, `na.exclude` inset `NA=s for cases omitted by =na.omit` in order to pad the residuals and predictions to the correct length.

In the future, you can determine the default of a particular option by using the `getOption()` function in a similar fashion to this:

```# To see default na.action option:
getOption("na.action")
#  "na.omit"
```

Peter Dalgaard discusses more ways to handle missing values on page 101.

We now have to refit `lm.velo` after changing the `na.action` option, and if we look at the fitted values, we see that observation 16 has a missing fitted value (`NA`):

```lm.velo<-lm(short.velocity ~ blood.glucose, data=thuesen)
fitted(lm.velo)
```

Now let's create a plot where residuals are displayed by connecting observations to corresponding points on the fitted line :

```with(thuesen, plot(blood.glucose, short.velocity,
xlab="Blood Glucose", ylab="Short Velocity",
main="Plot of Thuesen Data"))
abline(lm.velo)
with(thuesen, segments(x0=blood.glucose,
y0=fitted(lm.velo),
x1=blood.glucose,
y1=short.velocity))
```

We can also generate a simple plot of residuals versus fitted values :

```plot(fitted(lm.velo), resid(lm.velo),
xlab="Fitted Values from lm.velo",
ylab="Residuals from lm.velo",
main="Residuals versus Fitted values from lm.velo")
```

We can also get an indication of whether the residuals might have come from a normal distribution by checking for a straight line on a Q-Q plot :

```qqnorm(resid(lm.velo))
qqline(resid(lm.velo))
```

Illustration and explanation of further regression diagnostics is given in section 10.8. A basic set of four regression diagnostic plots is available via the plot method for the `lm` objects:
1. residual versus fitted values
2. Q-Q normal distribution plot of standardized residuals
3. plot of the square root of the absolute value of the standardized residuals
4. plot of "Cook's distance", which is a measure of the influence of each observation on the regression coefficients

```par(mfro=c(2,2))
plot(lm.velo)
par(mfrow=c(1,1))
```

The last thing I am going to demonstrate how to plot the fitted line of your model including the prediction and confidence bands . There is a complete explanation of the procress to produce this plot in section 5.3.

As Peter Dalgaard explains on page 106: "First we create a new data frame in which the blood.glucose variable contains the values at which we want predictions to be made":

```pred.frame<-data.frame(blood.glucose=4:20)
```

Next we generate two vectors which "contain the result of `predict` for the new data in =pred.frame=", specifically (1) the prediction limits, and (2) the confidence limits:

```pp<-predict(lm.velo, int="p", newdata=pred.frame)
pc<-predict(lm.velo, int="c", newdata=pred.frame)
```

"Now for the plotting: first we create a standard scatterplot, except we ensure that it has enough room for the prediction limits", by incorporating the `ylim` argument.

```with(thuesen, plot(blood.glucose, short.velocity,
ylim=range(short.velocity, pp, na.rm=T),
xlab="Blood glucose", ylab="Short Velocity",
main="Plot with Confidence and Prediction Bands"))
```

"Finally the curves are added, using as c-values the blood.glucose used for the prediction, and setting the line types and colours to more sensible values."

```matlines(pred.frame\$blood.glucose, pc, lty=c(1,2,2), col="black")
matlines(pred.frame\$blood.glucose, pp, lty=c(1,3,3), col="black")
```

We now probably want to set the `na.action` option back to `"na.omit"` instead of ="na.exclude"=":

```options(na.action="na.omit")
```

## Acknowledgements:

I would like to thank Richard Urbano for posing this problem.
Topic revision: r5 - 14 Nov 2006, TheresaScott

• Biostatistics Webs Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback