You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the pipeline I am working on use in parrallele consensus (based on unique molecular index) or classical non consensus reads.
So for a sample, I have in the same time the bwa aligned reads, and the bwa conensus reads (duplicates reads coming from the same initial fragment are merge into a single consensus read, and the base quality is adjusted).
Using ABRA2, I was able to correctly adjust CIGAR for a 71 bp del event. However, ABRA2 doesn't find the same results depending of the input reads (classical or merge reads). More precisely, for a given read with the exact same alignment/CIGAR and sequence before ABRA2 realignment, only the deletion present in the consensus reads (with higher baseQ) are corrected.
Exemple for 2 identical reads, one coming from the basic process and the other from the consensus process :
1 - For basic reads (no difference before and after ABRA2)
before ABRA2 --------------------------------------
A00514:721:HHC55DRXY:1:2126:13856:21167 147 chr7 116411750 60 102M26S = 116411564 -288 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC FFFFFFFF:FFFF:FF:FFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:102 AS:i:102 XS:i:20 RG:Z:D721R130
The only difference I found was the base quality (baseQ is increase if the consensus is good between duplicate reads), but in the command I launched, I precise --mapq 0 --mbq 0 to be sure to not filter reads because of their poor quality.
Here is the complete command line I used :
Finally I wanted to compare the INFO of the given segment between the 2 realignments but I don't find how to interpret the differences I saw :
for basic reads------------------------
INFO Fri Nov 05 20:32:32 UTC 2021 PROCESS_REGION_MSECS: chr7_116411546_116412048 63 0 13 0
for consensus reads------------------------
INFO Fri Nov 05 22:17:42 UTC 2021 PROCESS_REGION_MSECS: chr7_116411546_116412048 148 0 24 0
What is the meaning of the 4 last numbers? I didn't find the information...
If you could please explain me what could cause such differences and how to interprete the INFO, that would be great!
I can provide more information if needed!
Camille
The text was updated successfully, but these errors were encountered:
Hello,
the pipeline I am working on use in parrallele consensus (based on unique molecular index) or classical non consensus reads.
So for a sample, I have in the same time the bwa aligned reads, and the bwa conensus reads (duplicates reads coming from the same initial fragment are merge into a single consensus read, and the base quality is adjusted).
Using ABRA2, I was able to correctly adjust CIGAR for a 71 bp del event. However, ABRA2 doesn't find the same results depending of the input reads (classical or merge reads). More precisely, for a given read with the exact same alignment/CIGAR and sequence before ABRA2 realignment, only the deletion present in the consensus reads (with higher baseQ) are corrected.
Exemple for 2 identical reads, one coming from the basic process and the other from the consensus process :
1 - For basic reads (no difference before and after ABRA2)
before ABRA2 --------------------------------------
A00514:721:HHC55DRXY:1:2126:13856:21167 147 chr7 116411750 60 102M26S = 116411564 -288 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC FFFFFFFF:FFFF:FF:FFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:102 AS:i:102 XS:i:20 RG:Z:D721R130
after ABRA2--------------------------------------
A00514:721:HHC55DRXY:1:2126:13856:21167 147 chr7 116411750 60 102M26S = 116411564 -288 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC FFFFFFFF:FFFF:FF:FFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:102 AS:i:102 XS:i:20 RG:Z:D721R130
2 - For consensus reads (71pb DEL correctly aligned after ABRA2)
before ABRA2--------------------------------------
Illumina:608133 147 chr7 116411750 60 102M26S = 116411564 -288 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC NNNNNNNNNNNNNNNNMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NM:i:0 MD:Z:102 AS:i:102 XS:i:20 RG:Z:D721R130
after ABRA2--------------------------------------
Illumina:608133 147 chr7 116411750 60 102M71D26M = 116411564 -385 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC NNNNNNNNNNNNNNNNMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN YA:Z:chr7:116411008:844M71D663M MD:Z:102 RG:Z:D721R130 NM:i:71YM:i:0 YO:Z:chr7:116411750:-:102M26S AS:i:102 XS:i:20 YX:i:20
The only difference I found was the base quality (baseQ is increase if the consensus is good between duplicate reads), but in the command I launched, I precise --mapq 0 --mbq 0 to be sure to not filter reads because of their poor quality.
Here is the complete command line I used :
"""
java -Xmx35g -Djava.io.tmpdir=$TMPDIR -jar ${ABRA2JAR}
--ref hg19_base.fa
--in ${INPUT_BAM}
--targets $BED
--out ${OUTPUT_BAM}
--mapq 0 --mbq 0
2> Abra2.log
"""
Finally I wanted to compare the INFO of the given segment between the 2 realignments but I don't find how to interpret the differences I saw :
for basic reads------------------------
INFO Fri Nov 05 20:32:32 UTC 2021 PROCESS_REGION_MSECS: chr7_116411546_116412048 63 0 13 0
for consensus reads------------------------
INFO Fri Nov 05 22:17:42 UTC 2021 PROCESS_REGION_MSECS: chr7_116411546_116412048 148 0 24 0
What is the meaning of the 4 last numbers? I didn't find the information...
If you could please explain me what could cause such differences and how to interprete the INFO, that would be great!
I can provide more information if needed!
Camille
The text was updated successfully, but these errors were encountered: