Department of Biostatistics Seminar/Workshop Series

Selecting SNPs to Correctly Predict Ethnicity

Joshua Sampson, PhD

Post-Doctoral Fellow, Department of Biostatistics
Yale University

Wednesday, April 1, 1:30-2:30pm, MRBIII Conference Room 1220

Intended Audience: Persons interested in applied statistics, statistical theory, epidemiology, health services research, clinical trials methodology, statistical computing, statistical graphics, R users or potential users

Background: An individual's genotype at a group of Single Nucleotide Polymorphisms (SNPs) can be used to correctly predict that individual's ethnicity, or ancestry. In medical studies, knowledge of a subject's ethnicity can eliminate possible confounding, and in forensic applications, such knowledge can help direct investigations. In these cases, genotyping is often performed for the explicit purpose of identifying ancestry and the prediction rule, mapping genotype to ancestry, needs to be based on previously collected information.

Results: There are two goals: 1) Given the Human Genome Diversity Project (HGDP), a database with genotypes for 100's of individuals from 54 populations, and a specific set of SNPs, select a prediction rule that minimizes the expected error rate. 2) Design a method for selecting a set of N SNPs that minimizes the error rate for the chosen prediction rule. Both goals have been previously addressed. Here, we offer ways to improve the currently available methods, and greatly increase the accuracy in predicting ancestry. As both goals require good estimates of population specific allele frequencies, we show how to use a known phylogenetic tree to improve these estimates. Furthermore, we introduce a new method for estimating the error rate. We demonstrate the performance of these methods on both simulated data and the HGDP data.

Authors: Joshua Sampson, Kenneth K. Kidd, Judith R. Kidd, Hongyu Zhao

CV
Topic revision: r3 - 26 Apr 2013, JohnBock
 

This site is powered by FoswikiCopyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback