Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projecting indels across gap causes length change and non-reversibility #758

Open
davmlaw opened this issue Feb 19, 2025 · 1 comment
Open

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Feb 19, 2025

I understand a variant growing bigger if the destination reference has an insertion, but shouldn't it be put back when it goes the other way?

original_hgvs = "NM_015120.4(ALMS1):c.36_38dupGGA"

def print_hgvs(sv):
    length = sv.posedit.pos.end - sv.posedit.pos.start
    print(f"hgvs='{sv}' - {length=}")

var_c = parse(original_hgvs)
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)

Output:

hgvs='NM_015120.4(ALMS1):c.36_38dup' - length=2
hgvs='NC_000002.12:g.73385937_73385942dup' - length=5
hgvs='NM_015120.4:c.72_77dup' - length=5

Normlization?

I noticed that if you normalize this 1st, the problem goes away.

I think this is because normalization shifts the variant away from the gap. But this shouldn't matter? If you do need to normalize before projection then perhaps we should automatically do this or raise a warning or error if not normalized?

var_c_orig = parse(original_hgvs)
var_c = normalize(var_c_orig)
print(f"Normalized: {var_c_orig} => {var_c}")
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)

Output:

Normalized: NM_015120.4(ALMS1):c.36_38dup => NM_015120.4:c.75_77dup
hgvs='NM_015120.4:c.75_77dup' - length=2
hgvs='NC_000002.12:g.73385940_73385942dup' - length=2
hgvs='NM_015120.4:c.75_77dup' - length=2

Note - while searching issues I found discussion about alignment gaps (on this transcript!) on #514

@davmlaw
Copy link
Contributor Author

davmlaw commented Feb 19, 2025

To try and remove the normalization issue I made it so big it wouldn't shift, and was able to get it to shift from a dup to an ins:

original_hgvs = "NM_015120.4(ALMS1):c.36_77dup"
var_c_orig = parse(original_hgvs)
var_c = normalize(var_c_orig)
print(f"Normalized: {var_c_orig} => {var_c}")
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)

Output:

Normalized: NM_015120.4(ALMS1):c.36_77dup => NM_015120.4:c.36_77dup
hgvs='NM_015120.4:c.36_77dup' - length=41
hgvs='NC_000002.12:g.73385942_73385943insGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGA'
hgvs='NM_015120.4:c.77_78insGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGA'

So yeah I think normalization just hid it before.

I get the change going 1 way, but wondering if the conversion back is wrong, or there should def be a warning here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant