Department of Biostatistics Seminar/Workshop Series

Improving the Power to Detect Low-Frequency Genetic Variants in Genome-wide Association Studies

Clement Ma

Ph.D. Candidate, Department of Biostatistics, University of Michigan School of Public Health

Genome-wide association studies (GWAS) have identified over 80 common genetic variants associated with type 2 diabetes (T2D), but the disease etiology of this complex disorder is not fully understood. As part of the Genetics of Type 2 Diabetes (GoT2D) study, we performed whole genome sequencing of 1,326 T2D cases and 1,331 controls from Central and Northern Europe to identify novel common and low-frequency T2D-associated variants. Low pass (~4X) sequencing identified ~25M single nucleotide variants, achieving a >99% resolution of all genetic variation >0.1% frequency in these populations. To increase association power, we imputed variant genotypes into an additional 11,645 cases and 32,769 controls from 13 European studies with GWAS data. This comprehensive catalogue of common and low-frequency genetic variants allowed us to evaluate the impact of these variants to T2D risk.

However, new statistical challenges emerge for the analysis of low-frequency variants (minor allele frequency [MAF]<5%). Logistic regression-based tests and methods to combine information across multiple studies (e.g. joint and meta-analysis) may be poorly calibrated and/or of low power. In a simulation study, we evaluated the calibration and power of logistic regression tests in joint and meta-analysis for low-frequency variants (Ma et al., 2013). We demonstrated that: (a) for joint analysis, the Firth bias-corrected test has the best combination of type I error and power; (b) for meta-analysis of balanced studies (equal numbers of cases and controls), the score test is best, but is less powerful than Firth-test based joint analysis; and (c) for meta-analysis of sufficiently unbalanced studies, all four tests can be anti-conservative, particularly the score test.

By applying our recommended methods to the meta-analysis of the GoT2D sequenced and 13 imputed studies, we improved power for detecting low-frequency variants. We identified a novel, genome-wide significant, common variant near the CENPW locus (p=310-8). Through conditioning on known T2D variants, we identified two novel variants near the IRS1 and PPARG loci (p<1.810-6) independent of previously reported common signals; both were low-frequency and non-coding.
Topic revision: r1 - 18 Apr 2014, NanaKwarteng

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback