• MetaLen基于Python,是一个用来评估宏基因组生物多样性的工具;
  • 使用了de Bruijn图论和矩形总边长算法,在估计物种多样性时既不需要预设基因组丰度参数,也不需要一个庞大的参考数据库;
  • MetaLen同时使用短序列和长序列(SLR、Pacific、Nanopore),将长序列视为亚基因组(sub-genome);
  • 先将短序列比对到长序列,然后计算覆盖度和估计各物种的基因组长度、频率;
  • 最低可检测出宏基因组测序数据中丰度在0.01以下的物种。
Cell Systems近期发表一项研究,报导了可精确评估宏基因组生物多样性的新方法,检测低丰度的稀有物种时表现优秀,可用于肠道菌群、环境菌群等多种宏基因组样本分析。
Cell Systems [IF:8.673]

Joint Analysis of Long and Short Reads Enables Accurate Estimates of Microbiome Complexity



2018-07-25, Article

Abstract & Authors:展开

Reduced microbiome diversity has been linked to several diseases. However, estimating the diversity of bacterial communities—the number and the total length of distinct genomes within a metagenome—remains an open problem in microbial ecology. Here, we describe an algorithm for estimating the microbial diversity in a metagenomic sample based on a joint analysis of short and long reads. Unlike previous approaches, the algorithm does not make any assumptions on the distribution of the frequencies of genomes within a metagenome (as in parametric methods) and does not require a large database that covers the total diversity (as in non-parametric methods). We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for 50% of total abundance having total length varying from only 25 to 61 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two orders of magnitude larger total length (≈840 billion nucleotides).

First Authors:
Anton Bankevich

Correspondence Authors:
Anton Bankevich

All Authors:
Anton Bankevich,Pavel A Pevzner