Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to shorten run time #58

Open
jingjing578 opened this issue Oct 13, 2023 · 4 comments
Open

How to shorten run time #58

jingjing578 opened this issue Oct 13, 2023 · 4 comments

Comments

@jingjing578
Copy link

When I used whole-transcriptome data for re-alignment, it took nearly 12 hours for one sample. I wanted to know which parameters I could modify to shorten the running time. Here is the parameters I currently use.
java -Xmx16G -jar /home/software/abra2-2.24/target/abra2-2.24-jar-with-dependencies.jar --in ${outDir}/${sample}.bam --out ${outDir}/STAR/${sample}.Realign.bam --ws 300,150 --ref ${ref} --junctions bam --threads 10 --gtf ${gtf} --dist 500000 --sa --sua --tmpdir ${outDir}/STAR/tmp

@kevinpryan
Copy link

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command:
java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper :
"We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectively."

@colindaven
Copy link

You could use a tool like hyperfine https://github.com/sharkdp/hyperfine to measure the speedup when increasing the amount of threads. I suspect it might be overkill to give the tool more than 8-16 threads. Just use a tiny proportion of the bam like 1-2m reads so hyperfine runs quickly.

Then run multiple instances at once eg on several bams concurrently using a bash script, cluster or even just

your_command &

@jingjing578
Copy link
Author

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command: java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper : "We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectively."

I haven't found a solution yet, I'll try your method ,thank you

@jingjing578
Copy link
Author

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command: java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper : "_We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectivel

Hi, have you found a solution for this? I'm getting similar runtimes (12-13 hours) with the following command: java -Xmx84g -jar /abra2.jar --in ${samplename}.Aligned.sortedByCoord.out.bam --junctions SJ.out.tab --in-vcf ${samplename}.concatd.vcf --out ${out_bam} --ref Homo_sapiens.assembly38.no_ebv.fa --targets Twist_Comprehensive_Exome_Covered_Targets_hg38.bed --gtf gencode.v37.annotation.with.hervs.gtf --threads 32 --tmpdir tmp_dir --dist 500000 --sua > abra.log

I'm thinking of trying to skip the local assembly step with the --sa flag - this is mentioned in their paper : "We also assessed the performance of ABRA2 on this dataset both with and without assembly. Notably, high accuracy is achievable without utilizing localized assembly although assembly does offer a boost in recall for longer insertions. The version of ABRA2 run with localized assembly disabled detected 17 fewer insertions with a median length of 60 nucleotides. All other variants were detectable without the use of assembly (Supplementary Fig. S2). The runtime for ABRA2 with localized assembly disabled was 32 min. By contrast, when assembly is forced to execute across all target regions, the ABRA2 runtime was 610 min. This assemble all regions approach detected no additional variants detected compared to ABRA2 run with selective assembly. While localized assembly can be beneficial for variant detection, from a computational performance standpoint it can be helpful to perform this step selectively."

You could use a tool like hyperfine https://github.com/sharkdp/hyperfine to measure the speedup when increasing the amount of threads. I suspect it might be overkill to give the tool more than 8-16 threads. Just use a tiny proportion of the bam like 1-2m reads so hyperfine runs quickly.

Then run multiple instances at once eg on several bams concurrently using a bash script, cluster or even just

your_command &

thank you for your ideas,i'll consider it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants