-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgen_reader.allele_expectation allocates memory based on unindexed genotype #40
Comments
JordenEro,
Thanks for your bug report and thanks for using bgen-reader.
This looks like a possible problem in my section of the code (_bgen2.py). I'm on vacation and may not be able to look at it fully for a week. In the meantime, you may be able to work around the problem by using the Dask-inspired API (Dask-Inspired API (original) - bgen-reader 4.0.8 documentation<https://bgen-reader.readthedocs.io/en/latest/daskapi.html>)
Yours,
Carl
From: jordanero ***@***.***>
Sent: Tuesday, August 24, 2021 2:47 PM
To: limix/bgen-reader-py ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [limix/bgen-reader-py] bgen_reader.allele_expectation allocates memory based on unindexed genotype (#40)
bgen_reader.allele_expectation allocates memory based on the unindexed genotype. This causes problems when indexing a large bgen (for example UKBioBank).
The following code attempts to allocate a 4.45TiB array when computing the index for a single variant and sample
from bgen_reader import open_bgen
bgen = open_bgen('ukb_imp_chr22_v3.bgen', samples_filepath = 'ukb1404_imp_chr1_v2_s487406.sample', verbose = True)
bgen.allele_expectation(index = c(1,1))
Traceback (most recent call last):
File "", line 1, in
File "/n/home12/jrossen/.conda/envs/python3/lib/python3.8/site-packages/bgen_reader/_bgen2.py", line 1381, in allele_expectation
ploidy0 = self.read(return_probabilities=False, return_ploidies=True)[
File "/n/home12/jrossen/.conda/envs/python3/lib/python3.8/site-packages/bgen_reader/_bgen2.py", line 563, in read
ploidy_val = np.full(
File "/n/home12/jrossen/.conda/envs/python3/lib/python3.8/site-packages/numpy/core/numeric.py", line 343, in full
a = empty(shape, dtype, order)
numpy.core._exceptions.MemoryError: Unable to allocate 4.45 TiB for an array with shape (487409, 1255683) and data type int64
-
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flimix%2Fbgen-reader-py%2Fissues%2F40&data=04%7C01%7C%7C939ae3eab2134e13e21308d96737e99b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637654312098698505%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bKM6BoeS3owcRV1Bbhlu%2ByLau8TBrlZ9%2BRmennwZTLU%3D&reserved=0>, or unsubscribe<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABR65P5MISBGCMEFWX67XETT6PZKRANCNFSM5CXQ75JA&data=04%7C01%7C%7C939ae3eab2134e13e21308d96737e99b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637654312098708458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Lxyta3lVMKgEks5yqpNWwEKGbK6Qgz%2BFSVjWOShITHQ%3D&reserved=0>.
Triage notifications on the go with GitHub Mobile for iOS<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C939ae3eab2134e13e21308d96737e99b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637654312098708458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gEFPTi2KcqrsW2CDG%2FkV1iL4dwGODjaRHyUGIckN2zY%3D&reserved=0> or Android<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26utm_campaign%3Dnotification-email&data=04%7C01%7C%7C939ae3eab2134e13e21308d96737e99b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637654312098718418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BjduCwfO6tdGIBGltRA%2F6mw9mR2WkDQlWlDmSN2RnoI%3D&reserved=0>.
|
That's helpful. Thanks for making the package! |
This is fixed with branch "fixissue40". @jordanero, you can install the fix early with @horta When you get a chance, you can publish the fix?
|
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
bgen_reader.allele_expectation allocates memory based on the unindexed genotype. This causes problems when indexing a large bgen (for example UKBioBank).
The following code attempts to allocate a 4.45TiB array when computing the expectation for a single variant and sample
The text was updated successfully, but these errors were encountered: