-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chap job crashes due to memory issues #8
Comments
First of all, I should mentione that analysing frames every 100 ps is likely not a big issue when determining time-averaged quantities over a long trajectory. In fact, you may even want to exclude frames that are only a short time apart in order to decorrelate the data. Still, the high memory demand for long trajectories is a problem I would like to solve, but it turns out to be complicated. CHAP relies on In terms of workaround, you could run CHAP on individual trajectory chunks (first 10 ns, second 10 ns, etc.) and write a Python script to combine the output data. I would need to know which quantity you are after in order judge how feasible this would be, but in principle, CHAP allows its users access to (nearly) all data produced internally. |
I appreciate your point regarding the decorrelation of data, however, in my case I have a region rarelly hydrated so in order to get sufficient data about the hydration, I wanted to try using more than 1 frame every 100 ps. I understand the problem and I am affraid I wouldn't be able to help... One thing however is that is not clear to me is that the amount of memory use is far superior to the size of the trajectory itself: the memory consumption is over 60 GB after a few thousands frames analysed for a trajectory totalling 7 GB. What could be so large to use so much memory? Could there be a memory leack somewhere? Indeed, I thought about doing this, but I don't know how to combine the json formated output files. I am performing what I would describe as a standard analysis (radius profile, solvent number density profiles, minimum solvent density, etc.), so I am interested in getting the Many thanks! |
Combining the JSON files should be very straightforward if you have any prior experience in e.g. Python programming (similar scripting languages like R will work as well). A JSON file maps one-to-one to a nested structure of Python lists and dictionaries. All you'd need to do is to lead all CHAP output files (each derived from its own trajectory chunk), extract the relevant data (see the documentation for details), and paste it together. Forming an average should then be straightforward with e.g. |
Thanks. I use Python.
returns: whereas returns How can this discrepency in the patway definition can be solved? Is there a way to tel chap to use e.g. a pdb file to define the pathway? Or to specify a file containing the the array of (unique) values of Many thanks |
TLDR: You need to use interpolation to ensure that the s-coordinate is the same for all trajectory chunks. NumPy already provides a library for this: https://docs.scipy.org/doc/numpy/reference/generated/numpy.interp.html Here's why: CHAP always writes 1000 data points for each profile (reducing this number to something like 10 points with the One more comment: There is no need to create trajectory chunks. CHAP can selectively analyse only a specific time range using the flags |
Thanks for the suggestion and for the tips to specify a specific time range. I am afraid, however,that I fail to understand how to use numpy.interp in this case. Have you used such kind of script for this purpose before? Do you have any examples? |
Also, based on the amount of RAM currently required to process my trajectories every 100 ps (this amount seems to increase linearly with the number of frames. I expect that if I wanted to process my full trajectory of 19 GB (corresponding to a 1.5 us trajectory with a frames saved every 10 ps), I would require more than 1TB of RAM! |
Hi,
I am trying to run chap on a 50000 frames trajectory. However the job dies after about 1000 frames, I believe because it feels up all the RAM available on my computer: I can see all the amount of memory used increases until it uses everything.
I have to run it with
-dt 100
to have the calculation end successfully. Which means that I loose a significant portion of my MD.Is this due to a memory leak or other bug? Or does the program needs to keep everything in memory?
Is there a work around this problem? for example is it possible to force writing to disk instead of keeping in memory?
Many thanks in advance.
The text was updated successfully, but these errors were encountered: