---
title: "R Exercises"
author: "Cole Beck"
output:
pdf_document:
number_sections: no
html_document:
number_sections: no
---
```{r setup,echo=FALSE}
require(Hmisc)
knitrSet(lang='markdown')
```
# Manipulating Vectors
1. Modify the following character vector to keep only street names, then sort and remove duplicates.
```{r}
x <- c("120 Main St", "231 Walnut Grove", "374 Central Pk",
"402 Providence Ln", "555 Central Pk")
```
2. How could you sum all of the numbers between 1 and 1,000 that are evenly divisible by 3 or 5? What about numbers between 1 and 100,000 divisible by 4, 7, or 13?
```{r}
# sum [1,10] divisible by 3 or 5
3 + 5 + 6 + 9 + 10
```
# Writing Functions
Celsius to Fahrenheit: $f(x) = (x*9/5) + 32$
Celsius to Kelvin: $f(x) = x + 273.15$
1. Write a temperature conversion function. It should take a vector of temperatures, the `from` type, and the `to` type.
```{r, eval=FALSE}
# test temp function with this data
set.seed(20)
x <- round(rnorm(30, 10, 10))
xf <- temp(x, from='C', to='F')
xk <- temp(x, from='C', to='K')
all.equal(temp(xf, from='F', 'K'), xk)
```
# Manipulating Data Frames
1. Read in the CSV file
```{r, echo = FALSE}
"https://github.com/fonnesbeck/Bios6301/raw/master/datasets/haart.csv"
```
2. Describe the data set
```{r}
```
3. Create a categorical variable `gender`, using `male`
```{r}
```
4. Convert `init.date` and `last.visit` into Date variables
```{r}
```
5. Create the column `daysbetween` by calculating the number of days between visits
```{r}
```
6. Subset the data where `age` is greater than 40 and `death` is zero. Only keep the following columns: gender, age, cd4baseline, weight, daysbetween
```{r}
```
7. Reorder the data by `age`
```{r}
```
# Models
```{r}
gender <- c('M','M','F','M','F','F','M','F','M')
age <- c(34, 64, 38, 63, 40, 73, 27, 51, 47)
smoker <- c('no','yes','no','no','yes','no','no','no','yes')
exercise <- factor(c('moderate','frequent','some','some','moderate','none',
'none','moderate','moderate'),
levels=c('none','some','moderate','frequent'), ordered=TRUE
)
los <- c(4,8,1,10,6,3,9,4,8)
x <- data.frame(gender, age, smoker, exercise, los)
```
1. Create a linear model using `x`, estimating the association between `los` and all remaining variables
```{r}
```
2. Create a new model, this time predicting `los` by `gender`; show the model summary
```{r}
```
3. What is the estimate for the intercept? What is the estimate for gender?
```{r}
```
4. Re-calculate the standard errors, by taking the square root of the diagonal of the variance-covariance matrix of the summary of the linear model
```{r}
```
5. Predict `los` with the following new data set
```{r}
newdat <- data.frame(gender=c('F','M','F'))
```
6. Sum the square of the residuals of the model. Compare this to passing the model to the `deviance` function.
```{r}
```
7. Create a subset of `x` by taking all records where `gender` is 'M' and assigning it to the variable `men`. Do the same for the variable `women`.
```{r}
```
8. Call the `t.test` function, where the first argument is los for women and the second argument is los for men. Add the argument var.equal and set it to TRUE. Does this match the p-value computed in the model summary?
```{r}
```
# Generating Plots
Given the `vlbw` data set, use `ggplot2` and `qplot` and create several plots.
```{r}
require(ggplot2)
getHdata(vlbw)
```
1. Scatterplot of `gest` VS `bwt`
```{r}
```
2. Scatterplot `gest` VS `bwt`, add color and shape using variable `sex`
```{r}
```
3. Boxplot of `btw` by `sex`
```{r}
```
4. Scatterplot of `gest` VS `bwt`, facet by `race`
```{r}
```
5. Scatterplot of `gest` VS `bwt`, add regression line
```{r}
```