http://www.drive5.com/usearch/manual/cmd_otutab_rare.html
otutab_rare 抽样OTU比至某个指定数据量,方便比较Alpha多样性,对于抽平后的OTU表,会自动删除不满足样本量的样品,还会去除全为零的OTUs
usearch11 -otutab_rare otutab.txt -sample_size 10000 -output otutab10k.txt
主要三个参数,输入文件,抽平至相同的样本量,输出文件
运行输出结果如下:
00:01 43Mb 100.0% Reading otutab.txt
00:02 43Mb 100.0% Rarefying
Deleted 90 samples size < 10000
Deleted 814 OTUs with size=0 after rarefaction
Deleted 90 samples with size=0 after rarefaction
00:04 48Mb Writing otutab10k.txt ...done.
运行使用了4秒,43Mb内存,有90个样本数据量不到10000,抽平后有814个OTUs为零被删除,同时不满足数据量的90个样品也被删除。
我们统计一下抽样前后的比较
使用otutab_stats统计OTU表
usearch10 -otutab_stats otutab.txt -output otutab.stat
cat otutab.stat
结果如下:样本测序量最小值为5369,最大值有124817
38748638 Reads (38.7M)
1182 Samples
4996 OTUs
5905272 Counts
4339163 Count =0 (73.5%)
552758 Count =1 (9.4%)
316760 Count >=10 (5.4%)
54 OTUs found in all samples (1.1%)
370 OTUs found in 90% of samples (7.4%)
1145 OTUs found in 50% of samples (22.9%)
Sample sizes: min 5369, lo 16013, med 25362, mean 32782.3, hi 45569, max 124817
usearch10 -otutab_stats otutab10k.txt -output otutab10k.stat
cat otutab10k.stat
看到样品量全部抽平为10000,但样本量和OTU数量都有下降(删减)
10920000 Reads (10.9M)
1092 Samples
4182 OTUs
4566744 Counts
3607823 Count =0 (79.0%)
433338 Count =1 (9.5%)
124757 Count >=10 (2.7%)
28 OTUs found in all samples (0.7%)
210 OTUs found in 90% of samples (5.0%)
649 OTUs found in 50% of samples (15.5%)
Sample sizes: min 10000, lo 10000, med 10000, mean 10000.0, hi 10000, max 10000
综上,此种方法与之前QIIME的single_rarefaction.py
命令结果一致。之前usearch10中使用的otutab_norm
方式,不会删除低丰度的样品,会出现低样本抽高,alpha多样性偏低,低于抽平数量的样品稀释取线后期直线的问题。在usearch11中新增的otutab_rare
是对之前不完善结果的补充。
联系客服