HLAscan HLA-LA Optitype的HLA分型探索

HLAscan

发现这个软件之前的官网已经打不开,但是在github上仍然在更新,https://github.com/SyntekabioTools/HLAscan或许是换了工作?最近一次更新是2019.12.4,还是比较新的。发现wegene的NGS HLA分型报告是用的这个软件的参考文献,估计还是权威些的。

软件使用方法也有了一些变化,之前只是一个脚本,现在直接编译成了一个独立的可执行文件,运行效率应该也有很大的提高。也省去安装的繁琐。AMD YES的4700U也能跑得动,不错!

安装和运行

# 下载软件
wget https://github.com/SyntekabioTools/HLAscan/releases/download/v2.1.4/hla_scan_r_v2.1.4
wget https://github.com/SyntekabioTools/HLAscan/releases/download/v2.0.0/dataset.zip
#解压
unzip dataset.zip
#循环运行分型数据
for gene in 'HLA-A HLA-B HLA-C HLA-E HLA-F HLA-G
 MICA MICB HLA-DMA HLA-DMB HLA-DOA HLA-DOB HLA-DPA1 HLA-DPB1 HLA-DQA1 HLA-DQB1 HLA-DRA HLA-DRB1 HLA-DRB5 TAP1 TAP2'
 do 
./hla_scan_r_v2.1.4 -l ../read_1.fq -r ../read_2.fq  -d db/HLA-ALL.IMGT  
-t 8  -g $gene 

done

结果

然后就有了结果呀。

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:4:15
========================================================
HLA gene : HLA-A
# of considered types : 3182

----------- HLA-Types -----------
[Type 1]    31:01:02:01 EX3_209.094_100 EX2_244.789_100 EX4_291.888_100 EX5_190.632_100 
[Type 2]    03:01:01:03 EX3_166.42_100  EX2_197.259_100 EX4_250.399_100 EX5_169.726_100 

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:13:32
========================================================
HLA gene : HLA-B
# of considered types : 3958

----------- HLA-Types -----------
[Type 1]    48:01:01    EX3_528.214_100 EX2_654.385_100 EX4_984.435_100 EX5_607.077_100 
[Type 2]    15:11:01    EX3_368.938_100 EX2_464.43_100  EX4_692.304_100 EX5_423.094_100 

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:20:25
========================================================
HLA gene : HLA-C
# of considered types : 2735

----------- HLA-Types -----------
[Type 1]    08:01:01    EX3_169.279_100 EX2_194.726_100 EX4_296.558_100 EX5_194.783_100 
[Type 2]    03:03:01    EX3_167.344_100 EX2_171.144_100 EX4_266.931_100 EX5_155.4_100   

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:20:28
========================================================
HLA gene : HLA-E
# of considered types : 17

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:20:32
========================================================
HLA gene : HLA-F
# of considered types : 22

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:20:40
========================================================
HLA gene : HLA-G
# of considered types : 50

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:20:57
========================================================
HLA gene : MICA
# of considered types : 102

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:21:3
========================================================
HLA gene : MICB
# of considered types : 41

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:21:4
========================================================
HLA gene : HLA-DMA
# of considered types : 7

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:21:6
========================================================
HLA gene : HLA-DMB
# of considered types : 13

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:21:8
========================================================
HLA gene : HLA-DOA
# of considered types : 12

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:21:10
========================================================
HLA gene : HLA-DOB
# of considered types : 13

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:21:16
========================================================
HLA gene : HLA-DPA1
# of considered types : 40

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:22:30
========================================================
HLA gene : HLA-DPB1
# of considered types : 550

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:22:37
========================================================
HLA gene : HLA-DQA1
# of considered types : 54

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:24:20
========================================================
HLA gene : HLA-DQB1
# of considered types : 806

----------- HLA-Types -----------
[Type 1]    03:03:02:01 EX2_380.615_100 EX3_638.819_100 EX4_0_0 
[Type 2]    06:02:01    EX2_285.522_100 EX3_589.078_100 EX4_0_0 

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:24:21
========================================================
HLA gene : HLA-DRA
# of considered types : 7

----------- HLA-Types -----------
HLAscan cannot determine proper types

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:28:15
========================================================
HLA gene : HLA-DRB1
# of considered types : 1756

----------- HLA-Types -----------
[Type 1]    09:01:02    EX2_791.144_100 EX3_672.496_100 EX4_0_0 
[Type 2]    15:01:01:04 EX2_707.333_100 EX3_651.83_100  EX4_0_0 

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:28:18
========================================================
HLA gene : HLA-DRB5
# of considered types : 21

----------- HLA-Types -----------
[Type 1]    02:06   EX2_14.9259_50  EX3_0.58156_0   EX4_0_0 
[Type 2]    02:06   EX2_14.9259_50  EX3_0.58156_0   EX4_0_0 

=====================================================
HLAscan v2.1
Report created
2020. 10. 21.   12:28:20
========================================================


HLA-LA

1.软件安装和数据库准备
继续conda,解决软件安装难题,也不需要挑战有些门槛的docker。

## 安装
 conda install hla-la
 ## 数据库下载
 cd ~/miniconda3/opt/hla-la/
 mkdir graphs
  wget http://www.well.ox.ac.uk/downloads/PRG_MHC_GRCh38_withIMGT.tar.gz
tar -xvzf PRG_MHC_GRCh38_withIMGT.tar.gz
# 数据库索引,这步要耗30G的内存。。。,我这16G ram的笔记本靠swap扛着,速度就慢了不只一点了
cd ~/miniconda3/opt/hla-la/bin/
 ./HLA-LA --action prepareGraph --PRG_graph_dir ../graphs/PRG_MHC_GRCh38_withIMGT

2.用起来,分型

就简单的几个参数,8核,速度也就慢慢跑了,不知道会不会报错。

HLA-LA.pl --BAM ./2hla_sorted.bam   --graph PRG_MHC_GRCh38_withIMGT --sampleID 10 --maxThreads 8  --workingDir ./

然后在swap+ram达到极限的70G的时候停止运行了。
gihub上看到这个issue我有点绝望了,我的硬件达不到这水平呀!

my paired-end fastq file:
R1.fastq (250 Million reads, 150bp, ~1.2 GB)
R2.fastq (250 Million reads, 150bp, ~1.2 GB)
run HLA-LA will used about 300~400 GB RAM and ~90GB swap

Optitype

软件安装

最开始尝试使用docker,无奈悲剧的失败,发现bioconda有这个软件的,于是上conda,感觉比docker更方便呢。还有一个好处是,win10家庭版不支持docker,要想支持得修改注册表一通操作,太麻烦了。

# 下面两个命令选一就可以了
conda install -c bioconda optitype 
conda install -c bioconda/label/cf201901 optitype 

运行和结果

很简单的一条命令就可以了。

 OptiTypePipeline.py  -i read_1.fq read_2.fq --dna -v -o optutype

AMD YES的r7-4700u加持下,在近乎突破硬件极限的情况下完成了分型。
结果首先是个pdf文件,是分型结果的测序覆盖度图。
在这里插入图片描述
然后是一个tsv文件,分型结果,是只有ABC的结果,4位的:

        A1      A2      B1      B2      C1      C2      Reads   Objective
0       A*03:01 A*31:01 B*15:11 B*48:01 C*03:03 C*08:01 15556.0 15135.987999999903

发表评论