How to shorten run time #58

jingjing578 · 2023-10-13T00:48:43Z

When I used whole-transcriptome data for re-alignment, it took nearly 12 hours for one sample. I wanted to know which parameters I could modify to shorten the running time. Here is the parameters I currently use.
java -Xmx16G -jar /home/software/abra2-2.24/target/abra2-2.24-jar-with-dependencies.jar --in ${outDir}/${sample}.bam --out ${outDir}/STAR/${sample}.Realign.bam --ws 300,150 --ref ${ref} --junctions bam --threads 10 --gtf ${gtf} --dist 500000 --sa --sua --tmpdir ${outDir}/STAR/tmp

kevinpryan · 2024-01-22T12:07:06Z

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command:
java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper :
"We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectively."

colindaven · 2024-01-23T13:27:49Z

You could use a tool like hyperfine https://github.com/sharkdp/hyperfine to measure the speedup when increasing the amount of threads. I suspect it might be overkill to give the tool more than 8-16 threads. Just use a tiny proportion of the bam like 1-2m reads so hyperfine runs quickly.

Then run multiple instances at once eg on several bams concurrently using a bash script, cluster or even just

your_command &

jingjing578 · 2024-01-25T03:02:11Z

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command: java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper : "We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectively."

I haven't found a solution yet, I'll try your method ,thank you

jingjing578 · 2024-01-25T03:07:10Z

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command: java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper : "_We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectivel

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command: java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper : "We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectively."

You could use a tool like hyperfine https://github.com/sharkdp/hyperfine to measure the speedup when increasing the amount of threads. I suspect it might be overkill to give the tool more than 8-16 threads. Just use a tiny proportion of the bam like 1-2m reads so hyperfine runs quickly.

Then run multiple instances at once eg on several bams concurrently using a bash script, cluster or even just

your_command &

thank you for your ideas，i'll consider it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to shorten run time #58

How to shorten run time #58

jingjing578 commented Oct 13, 2023

kevinpryan commented Jan 22, 2024

colindaven commented Jan 23, 2024

jingjing578 commented Jan 25, 2024

jingjing578 commented Jan 25, 2024

How to shorten run time #58

How to shorten run time #58

Comments

jingjing578 commented Oct 13, 2023

kevinpryan commented Jan 22, 2024

colindaven commented Jan 23, 2024

jingjing578 commented Jan 25, 2024

jingjing578 commented Jan 25, 2024