HLA divergence calculation might not be correct #5

steletvinicius · 2025-02-12T16:57:05Z

Hi again.

Could you please clarify how the HLA divergence score is calculated?
What is the math behind it?

I have got this comment from the hla_divergence.R file: https://github.com/slowkow/hlabud/blob/main/R/hla_divergence.R

The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length.

Take into consideration the position argument You added in response to my request about calculating HLA divergence for HLA allele protein domains (substring of the alignment / subset of columns from the hla_alignment matrix).

Taking as an example the allele pair B44:02+B44:03, I have run some tests and It seems to me that something might be not working like expected.

To summarize:

HLA protein sequence length is overestimated as it is considering the number of columns from the hla_alignments object which do not correspond to the correct number of amino acids forming the full HLA protein
For HLA B, the matrix has 380 columns; however, the HLA-B protein has 362 residues
When applying the protein segmentation strategy, a new feature implemented as discussed on the issue HLA divergence restricted to the peptide binding groove #4 ,
the HLA divergence calculation is not assuming the new potein segment length informed on the positions argument.
Given that, in the end, we will not control/normalize the sequence divergence accordingly as we expect to have a higher divergence index (normalized by area) on the peptide binding groove (wher the amount of differences is more concentrated) when compared to what is found for complete HLA protein.

Please, check my tests on the gist mentioned below.
You will also find there an IMGT protein alignment example.

https://gist.github.com/steletvinicius/7789538387ac7d94c417b747db2be655

slowkow · 2025-02-21T15:40:01Z

@steletvinicius I've been unable to work this past week but I'll get back to you soon. Thanks for reporting the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HLA divergence calculation might not be correct #5

HLA divergence calculation might not be correct #5

steletvinicius commented Feb 12, 2025 •

edited by slowkow

Loading

slowkow commented Feb 21, 2025

HLA divergence calculation might not be correct #5

HLA divergence calculation might not be correct #5

Comments

steletvinicius commented Feb 12, 2025 • edited by slowkow Loading

slowkow commented Feb 21, 2025

steletvinicius commented Feb 12, 2025 •

edited by slowkow

Loading