Developing a statistical toolbox to describe and compare communities using metagenomic sequences
Speaker: Patrick Schloss
University of Massachusetts
Amherst, MA 01003, USA
Metagenomic shotgun sequencing has made it possible to interrogate the genetic diversity of as yet uncultured microbes allowing for the study of microbial ecology at the genomic level. Two common refrains from these studies is the general inability to assemble large genomic fragments and the inability to assign a function to most of the predicted open reading frames (ORFs). Both of these limit the ability to make important ecological inferences from metagenomic shotgun sequencing. Therefore, we have begun to develop a statistical toolbox that uses individual sequence reads and does not require an annotation to describe and compare microbial communities. We have successfully adapted tools developed for analysis of sequence collections generated from PCR-derived clone libraries to the collection of ORFs extracted from individual sequence reads. These tools enable us to estimate the richness of operationally defined protein families (OPFs) and the richness of these families that are shared between communities. They also allow us to compare the relative abundance of these families. Using these approaches we have been able to identify OPFs, which dominate their community suggesting their ecological importance; however, these OPFs lack homology to proteins with a known function. Also, we have been able to predict the number of OPFs that are shared between disparate communities. Advancing statistical tools for describing and comparing microbial communities is essential to the design of future studies and the analysis of the data that are generated.