Methods of Retrieving Datasets
Most of the datasets on this site are in the S
dumpdata format (file suffix of
.sdd) and R compressed
save() file format (suffix of
.sav). Some datasets are available in Excel or ASCII (
csv) formats.
- To manually download and install a dataset, right click on a file to save it to a temporary disk location, e.g., into a directory such as
\windows\temp or /tmp
- In S-Plus you can import the datasets using the
File ... Import ... S-Plus Transport File dialog
- Alternatively, use the S-Plus or R command
data.restore('/mydir/file.sdd')
- In
R, data.restore is found in the foreign package, but the binary save files are much better to use.
- If you have issued
library(Hmisc) in R or S-Plus you can download and load() a dataset by just typing getHdata(dataset name). To list available dataset names just type getHdata(). Type ?getHdata to see other options including ones to browse a dataset's html(contents()) file or its description file (if available) on our web site. Here's an example:
getHdata(prostate)
attach(prostate) ...
If using S-Plus, your system does not have the
wget executable you must install it for
getHdata or
download.file to work. Windows users may obtain
wget.exe here. Download to a temp file, unzip, and put
wget.exe in the same directory in which Windows stores
ftp.exe.
- In S-Plus 5 or later you will need to run imported data frames through the
Hmisc library's cleanup.import function if not using getHdata, e.g., pbc <- cleanup.import(pbc) to remove object classes that are not allowed in Version 4 of the S language due to its inability to handle multiple inheritance. If using .sdd files in R you may want to also run the files through cleanup.import to store them more efficiently (save files are already stored efficiently). When using getHdata in S-Plus it will automatically run cleanup.import for you.
- In R you can also
load() a dataset directly from the web using load(url('http://biostat.mc.vanderbilt.edu/...foo.sav')).
- It is best to use the datasets with the
Hmisc library in effect. Among other things, this will allow you to use the Hmisc describe and contents functions to obtain documentation about the variables.