利用宏基因组的进化特征预测基因的功能
  • 宏基因组进化特征(MPP)是环境DNA测序数据的编码信息,由宏基因组基因家族的相对丰度组成;
  • 利用人类肠道微生物组、海洋宏基因组等DNA序列,MPP可准确地预测826个GO功能类别;
  • 不同环境的MPP可推断不同的基因功能集,对>5000个宏基因组的模拟表明,增加全球宏基因组数据集中的环境多样性,是提高MPP预测准确性的关键;
  • 建议宏基因组与其他方法构成新工具,为比较基因组学产生推论,系统预测基因的生物学作用。
主编推荐语
高春辉
即便是在模式微生物中,也有很多基因的功能是未知的。Microbiome上的一个研究通过宏基因组数据的进化特征预测基因功能,具有很高的准确性。作者认为,随着宏基因组数据增加带来更大的环境多样性,可进一步提高预测的准确性。这会是生物大数据给我们带来的小确幸吗?
关键字
延伸阅读本研究的原文信息和链接出处,以及相关解读和评论文章。欢迎读者朋友们推荐!
图片
Microbiome [IF:11.607]

The evolutionary signal in metagenome phyletic profiles predicts many gene functions

宏基因组物种特征中的进化信号可预测多种基因功能

10.1186/s40168-018-0506-4

2018-07-10, Article

Abstract & Authors:展开

Abstract:收起
Background: The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner.
Results: We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models.
Conclusions: In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.

First Authors:
Vedrana Vidulin

Correspondence Authors:
Fran Supek

All Authors:
Vedrana Vidulin,Tomislav Šmuc,Sašo Džeroski,Fran Supek

评论