Datasets
Most of the datasets on this page are in the S
dumpdata and R compressed
save() file formats. Some are available in Excel and ASCII (
.csv) formats. Methods for retrieving and importing datasets may be found
here. If you need one of the datasets we maintain converted to a non-S format please e-mail
charles.dupont@vanderbilt.edu to make a request.
For R users of the
prostate dataset, put
library(chron) into effect to handle date variables.
Note: To make
csv files from R
save files do the following:
R
load(url('http://biostat.mc.vanderbilt.edu/twiki/pub/Main/DataSets/foo.sav'))
ls() # find name of data frame just loaded (here assumed 'foo')
write.table(d, file='foo.csv', sep=',', col.names=NA)
Other Datasets Available from the Web
- Data and Story Library from
Carnegie Mellon University - This is a
treasure trove of datasets. The data are found inside
HTML documents, so you may wish to click on File &
Save as with your browser to save the data into a plain
text file. Once inside an editor, click on the data
documentation and copy it to another file. Edit the
resulting .txt file to leave only the data. Many of the
datasets delimit the columns of data using tabs. R and S-Plus
will readily import such data.
- Australasian Data and Story Library, containing a
large number of interesting datasets, many pertaining to Australia
- Other datasets from the StatLib Repository at Carnegie
Mellon University. The
Plasma_Retinol dataset is available
as an annotated R save file or an S-Plus transport
format dataset using the getHdata function in the Hmisc
package
- Datasets from the UCI Machine Learning Repository
- Datasets from the Dartmouth Chance data site
- Datasets from the University of Massachusetts Amherst
- Data from the Centers for Disease Control
- Data from the NIH. You have to request the data, but the site is immediately valuable as a source of data collection forms used in clinical (especially cardiovascular) studies.
- Data from the Geospatial and Statistical Data Center of the University of Virginia. See
especially the City and County Data Books
- Data from the Peace Science Society,
Penn State University (war casualties, etc.)
- Data from the U.S. Joint Global Ocean Flux Study
- Mike Dowling's Interactive Table of World Nations, containing gross domestic product and
several descriptor variables for all countries in the world.
- Data from the Consortium for International Earth Science Information Network Dataset Guide
- Data from Arizona Elementary School Districts
- Statistical Society of Canada's
archived case studies
- Datasets for research use from the National Heart, Lung, and Blood Institute of the U.S. National Institutes of Health
- A wonderful set of links to various dataset sources from Key Curriculum Press
- Links to other dataset repositories and tips on surfing the web for data, by Robin Lock, Mathematics Dept., St. Lawrence University
- Datasets from Exploring Data from Education Queensland
- Data from Statistical Science Web
- Datasets from
Statistical Methods for the Analysis of Repeated Measurements by Charles S. Davis
- Datasets from the UCLA
Department of Statistics
- Datasets from Early (and Late) Phases of Drug Research by Thomas E Bradstreet
- Datasets from Interactive and Dynamic Graphics for Data Analysis by Swayne, Cook, Buja, Hofmann, Lang
- Datasets from IBM's Many Eyes visualization project
- Swivel
to top