You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length.
Take into consideration the position argument You added in response to my request about calculating HLA divergence for HLA allele protein domains (substring of the alignment / subset of columns from the hla_alignment matrix).
Taking as an example the allele pair B44:02+B44:03, I have run some tests and It seems to me that something might be not working like expected.
To summarize:
HLA protein sequence length is overestimated as it is considering the number of columns from the hla_alignments object which do not correspond to the correct number of amino acids forming the full HLA protein
For HLA B, the matrix has 380 columns; however, the HLA-B protein has 362 residues
When applying the protein segmentation strategy, a new feature implemented as discussed on the issue HLA divergence restricted to the peptide binding groove #4 ,
the HLA divergence calculation is not assuming the new potein segment length informed on the positions argument.
Given that, in the end, we will not control/normalize the sequence divergence accordingly as we expect to have a higher divergence index (normalized by area) on the peptide binding groove (wher the amount of differences is more concentrated) when compared to what is found for complete HLA protein.
Please, check my tests on the gist mentioned below.
You will also find there an IMGT protein alignment example.
Hi again.
Could you please clarify how the HLA divergence score is calculated?
What is the math behind it?
I have got this comment from the hla_divergence.R file: https://github.com/slowkow/hlabud/blob/main/R/hla_divergence.R
The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length.
Take into consideration the position argument You added in response to my request about calculating HLA divergence for HLA allele protein domains (substring of the alignment / subset of columns from the hla_alignment matrix).
Taking as an example the allele pair B44:02+B44:03, I have run some tests and It seems to me that something might be not working like expected.
To summarize:
HLA protein sequence length is overestimated as it is considering the number of columns from the hla_alignments object which do not correspond to the correct number of amino acids forming the full HLA protein
For HLA B, the matrix has 380 columns; however, the HLA-B protein has 362 residues
When applying the protein segmentation strategy, a new feature implemented as discussed on the issue HLA divergence restricted to the peptide binding groove #4 ,
the HLA divergence calculation is not assuming the new potein segment length informed on the positions argument.
Given that, in the end, we will not control/normalize the sequence divergence accordingly as we expect to have a higher divergence index (normalized by area) on the peptide binding groove (wher the amount of differences is more concentrated) when compared to what is found for complete HLA protein.
Please, check my tests on the gist mentioned below.
You will also find there an IMGT protein alignment example.
https://gist.github.com/steletvinicius/7789538387ac7d94c417b747db2be655
The text was updated successfully, but these errors were encountered: