flexbar3.0试用笔记

想试试flexbar切接头和barcode的效果,于是根据文章内容模拟了一个数据集,另外,使用16S测序的示例数据看看分barcode的情况。

1.16S数据的测试

由于我没有什么经验,于是就先按照qiime的流程做,拼接,提取barcode,然后后边切分数据试试。

flexbar -r fastq-join_joined/fastqjoin.join.fastq -f -t 16s -br temp/PE250_barcode/barcodes.fastq

有点不明白为什么只得到了一个文件,或许是在命名上标注了的。

2.模拟数据测试

2.1 使用randomreads.sh模拟

命令是

randomreads.sh ref=../reference_hg38_no_alt/chr6.fa \

out=chr6_simulating.fq length=100 reads=1000000

比较简单的命令,只需要 两三个参数即可解决问题,参考基因组,输出文件名,以及reads长度和数目。ps.本来知道jimmy流程中的华大pirs,可以完成这个任务,那个命令有点复杂,好像是更专业 。

我用6号染色体作参考基因组,运行verbose如下 :

java -ea -Xmx1458m -cp /home/biolinux/miniconda3/opt/bbmap-37.75/current/ align2.RandomReads3 build=1 ref=../reference_hg38_no_alt/chr6.fa out=chr6_simulating.fq length=100 reads=1000000
Executing align2.RandomReads3 [build=1, ref=../reference_hg38_no_alt/chr6.fa, out=chr6_simulating.fq, length=100, reads=1000000]
Writing reference.
Executing dna.FastaToChromArrays2 [../reference_hg38_no_alt/chr6.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=500, nodisk=false]
Set genScaffoldInfo=true
Writing chunk 1
Waiting for writing to finish.
Finished.
snpRate=0.0, max=0, unique=true
insRate=0.0, max=0, len=(0-0)
delRate=0.0, max=0, len=(0-0)
subRate=0.0, max=0, len=(0-0)
nRate  =0.0, max=0, len=(0-0)
genome=1
PERFECT_READ_RATIO=0.0
ADD_ERRORS_FROM_QUALITY=true
REPLACE_NOREF=false
paired=false
read length=100
reads=1000000
Wrote chr6_simulating.fq
Time: 34.059 seconds.

2.2 addadapters.sh加接头

也是比较简单的命令:

illumina接头文件下载:

wget https://raw.githubusercontent.com/vsbuffalo/scythe/master/illumina_adapters.fa

addadapters.sh in=chr6_simulating.fq out=chr6_add_adpater.fq adapters=illumina_adapters.fa

运行过程如下:

java -ea -Xmx200m -cp /home/biolinux/miniconda3/opt/bbmap-37.75/current/ jgi.AddAdapters in=chr6_simulating.fq out=chr6_add_adpater.fq adapters=illumina_adapters.fa
Executing jgi.AddAdapters [in=chr6_simulating.fq, out=chr6_add_adpater.fq, adapters=illumina_adapters.fa]
Adapters Added:         500438 reads (50.04%) 13323422 bases (13.32%)
Valid Output:           994914 reads (99.49%) 74710060 bases (74.71%)
Time:                         3.791 seconds.
Reads Processed:       1000k 263.81k reads/sec
Bases Processed:        100m 26.38m bases/sec
2.3 去接头
命令是:
flexbar -r chr6_add_adpater.fq -a illumina_adapters.fa
运行前后文件大小对比:
-rw——- 1 biolinux biolinux 210M 12月 22 14:20 chr6_add_adpater.fq -rw——- 1 biolinux biolinux 241M 12月 22 14:02 chr6_simulating.fq -rw——- 1 biolinux biolinux 158M 12月 22 15:13 flexbarOut.fastq -rw——- 1 biolinux biolinux 2.5K 12月 22 15:13 flexbarOut.log -rw——- 1 biolinux biolinux 386 12月 22 14:20 illumina_adapters.fa drwx—— 1 biolinux biolinux 0 12月 22 14:02 ref
运行统计:
Output file statistics ====================== Read file: flexbarOut.fastq written reads 909535 short reads 90465 Filtering statistics ==================== Processed reads 1000000 skipped due to uncalled bases 0 short prior to adapter removal 0 finally skipped short reads 90465 Discarded reads overall 90465 Remaining reads 909535 (90%) Processed bases 100000000 Remaining bases 73856012 (73% of input)

 

发表评论

电子邮件地址不会被公开。 必填项已用*标注