想试试flexbar切接头和barcode的效果,于是根据文章内容模拟了一个数据集,另外,使用16S测序的示例数据看看分barcode的情况。
1.16S数据的测试
由于我没有什么经验,于是就先按照qiime的流程做,拼接,提取barcode,然后后边切分数据试试。
flexbar -r fastq-join_joined/fastqjoin.join.fastq -f -t 16s -br temp/PE250_barcode/barcodes.fastq
有点不明白为什么只得到了一个文件,或许是在命名上标注了的。
2.模拟数据测试
2.1 使用randomreads.sh模拟
命令是
randomreads.sh ref=../reference_hg38_no_alt/chr6.fa \
out=chr6_simulating.fq length=100 reads=1000000
比较简单的命令,只需要 两三个参数即可解决问题,参考基因组,输出文件名,以及reads长度和数目。ps.本来知道jimmy流程中的华大pirs,可以完成这个任务,那个命令有点复杂,好像是更专业 。
我用6号染色体作参考基因组,运行verbose如下 :
java -ea -Xmx1458m -cp /home/biolinux/miniconda3/opt/bbmap-37.75/current/ align2.RandomReads3 build=1 ref=../reference_hg38_no_alt/chr6.fa out=chr6_simulating.fq length=100 reads=1000000
Executing align2.RandomReads3 [build=1, ref=../reference_hg38_no_alt/chr6.fa, out=chr6_simulating.fq, length=100, reads=1000000]
Writing reference.
Executing dna.FastaToChromArrays2 [../reference_hg38_no_alt/chr6.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=500, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Waiting for writing to finish.
Finished.
snpRate=0.0, max=0, unique=true
insRate=0.0, max=0, len=(0-0)
delRate=0.0, max=0, len=(0-0)
subRate=0.0, max=0, len=(0-0)
nRate =0.0, max=0, len=(0-0)
genome=1
PERFECT_READ_RATIO=0.0
ADD_ERRORS_FROM_QUALITY=true
REPLACE_NOREF=false
paired=false
read length=100
reads=1000000
Wrote chr6_simulating.fq
Time: 34.059 seconds.
2.2 addadapters.sh加接头
也是比较简单的命令:
#illumina接头文件下载:
wget https://raw.githubusercontent.com/vsbuffalo/scythe/master/illumina_adapters.fa
addadapters.sh in=chr6_simulating.fq out=chr6_add_adpater.fq adapters=illumina_adapters.fa
运行过程如下:
java -ea -Xmx200m -cp /home/biolinux/miniconda3/opt/bbmap-37.75/current/ jgi.AddAdapters in=chr6_simulating.fq out=chr6_add_adpater.fq adapters=illumina_adapters.fa
Executing jgi.AddAdapters [in=chr6_simulating.fq, out=chr6_add_adpater.fq, adapters=illumina_adapters.fa]
Adapters Added: 500438 reads (50.04%) 13323422 bases (13.32%)
Valid Output: 994914 reads (99.49%) 74710060 bases (74.71%)
Time: 3.791 seconds.
Reads Processed: 1000k 263.81k reads/sec
Bases Processed: 100m 26.38m bases/sec
2.3 去接头
命令是:
flexbar -r chr6_add_adpater.fq -a illumina_adapters.fa
运行前后文件大小对比:
-rw——- 1 biolinux biolinux 210M 12月 22 14:20 chr6_add_adpater.fq
-rw——- 1 biolinux biolinux 241M 12月 22 14:02 chr6_simulating.fq
-rw——- 1 biolinux biolinux 158M 12月 22 15:13 flexbarOut.fastq
-rw——- 1 biolinux biolinux 2.5K 12月 22 15:13 flexbarOut.log
-rw——- 1 biolinux biolinux 386 12月 22 14:20 illumina_adapters.fa
drwx—— 1 biolinux biolinux 0 12月 22 14:02 ref
运行统计:
Output file statistics
======================
Read file: flexbarOut.fastq
written reads 909535
short reads 90465
Filtering statistics
====================
Processed reads 1000000
skipped due to uncalled bases 0
short prior to adapter removal 0
finally skipped short reads 90465
Discarded reads overall 90465
Remaining reads 909535 (90%)
Processed bases 100000000
Remaining bases 73856012 (73% of input)