Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods that take a position have unexpected behaviour for large positions #52

Open
abowden1989 opened this issue Nov 2, 2022 · 3 comments

Comments

@abowden1989
Copy link

Methods that take a position (pos) as an argument exhibit unexpected behaviour if the position is too large to be supported by the variantkey format (i.e. if the position does not fit in the available 28 bits). In this case, the position can end up being bitshifted into the bits reserved for the encoding of the chromosome.

The methods in question are encode_variantkey and variantkey_range.

For example, if encode_variantkey is called with chrom=2 and pos=2^28, then a variantkey on chromosome 3 will be returned as 1 bit from the position has been bitshifted into the least significant bit encoding the chromosome, which was previously a 0.

@abowden1989
Copy link
Author

There are no plans to address this at the current time, as the C code does not support error handling which would be necessary for a proper fix

@nicolaasuni
Copy link
Contributor

nicolaasuni commented Nov 2, 2022

@abowden1989 My understanding is that the Chromosome 1 is the largest human chromosome with 249 million nucleotide base pairs. 2^28 can hold a max position of 268,435,455 > 249 million.

The library is designed on purpose for performance and any value range check if required is a burden of the caller.

You can check my updated variantkey version at: https://github.com/tecnickcom/variantkey for more support.

@abowden1989
Copy link
Author

abowden1989 commented Nov 2, 2022

Yes, 28 bits is enough to encode any position on the human genome, so there is no issue in that regard. However, the signature of the methods in question is that they will happily accept any 32 bit integer. If someone asks for the variantkey_range on chromosome 1 where the minimum position is 0 and the maximum position is i32::MAX, they might expect to receive all valid variantkeys on chromosome 1, but instead the returned range will overflow to other chromosomes.

We (Genomics plc) have no intention of addressing this at this time in this repo, and the workaround is never to pass positions >= 2^28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants