Biostatistics Weekly Seminar

A statistical framework for predictive modeling of microbial community data

Li Chen, PhD
Harrison School of Pharmacy

Fueled by the development of next generation sequencing technology, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. Moreover, it has frequently been observed that a cluster or clusters of bacteria at varying phylogenetic depths are associated with some clinical or biological outcome due to shared biological function (clustered signal). Furthermore, we observe the community changes are associated with a small subset of “marker” taxa (sparse signal) and it is also likely that a community-level change exists, where a large number of functionally interdependent species are associated with the outcome (dense signal). Unfortunately, prediction models of microbial community data considering both the signal density and the phylogenetic tree still remain under-developed. To fill this gap, I will first introduce “SICS”, a prediction method based on a phylogeny-regularized sparse regression model, to exploit sparse and clustered microbiome signals by using a novel phylogeny-based smoothness penalty to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Second, I will introduce “glmmTree”, a prediction method based on a phylogeny-regularized generalized linear mixed model, for capturing clustered and dense microbiome signals. Additional tuning parameters enable a data-adaptive approach to capture signals at different phylogenetic depth and abundance level. Using simulated and real datasets, I will show that the proposed approaches achieve better prediction performance than competing methods. At the end of this talk, I will briefly introduce my other omics work such as deep learning in predictive modeling of microbiome data, ensemble learning of regulatory variants annotation and prediction, data modeling for ChIP-seq comparison and single-cell RNA-seq clustering.

2525WE, 10th floor VICTR Conference Room
1 February 2019

Speaker Itinerary

Topic revision: r1 - 09 Jan 2019, TawannaPeters

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback