Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLA divergence calculation might not be correct #5

Open
steletvinicius opened this issue Feb 12, 2025 · 1 comment
Open

HLA divergence calculation might not be correct #5

steletvinicius opened this issue Feb 12, 2025 · 1 comment

Comments

@steletvinicius
Copy link

steletvinicius commented Feb 12, 2025

Hi again.

Could you please clarify how the HLA divergence score is calculated?
What is the math behind it?

I have got this comment from the hla_divergence.R file: https://github.com/slowkow/hlabud/blob/main/R/hla_divergence.R

The divergence is the sum of the distances between each pair of amino acids at each position, divided by the total sequence length.

Take into consideration the position argument You added in response to my request about calculating HLA divergence for HLA allele protein domains (substring of the alignment / subset of columns from the hla_alignment matrix).

Taking as an example the allele pair B44:02+B44:03, I have run some tests and It seems to me that something might be not working like expected.

To summarize:

  1. HLA protein sequence length is overestimated as it is considering the number of columns from the hla_alignments object which do not correspond to the correct number of amino acids forming the full HLA protein
    For HLA B, the matrix has 380 columns; however, the HLA-B protein has 362 residues

  2. When applying the protein segmentation strategy, a new feature implemented as discussed on the issue HLA divergence restricted to the peptide binding groove #4 ,
    the HLA divergence calculation is not assuming the new potein segment length informed on the positions argument.
    Given that, in the end, we will not control/normalize the sequence divergence accordingly as we expect to have a higher divergence index (normalized by area) on the peptide binding groove (wher the amount of differences is more concentrated) when compared to what is found for complete HLA protein.

Please, check my tests on the gist mentioned below.
You will also find there an IMGT protein alignment example.

https://gist.github.com/steletvinicius/7789538387ac7d94c417b747db2be655

@slowkow
Copy link
Owner

slowkow commented Feb 21, 2025

@steletvinicius I've been unable to work this past week but I'll get back to you soon. Thanks for reporting the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants