You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been having issue with my OS complaining it was running out of memory while doing some seemingly straight forward processing of neuron meshes in Python.
Consider this example from a fresh Python session in which we have loaded 200 neuron meshes:
Size of the neuron list is somewhere around 1.3Gb which accounts for most of the RSS of the process. The OSX Activity Monitor says the "Real Memory Size" is around 1.8Gb.
Now watch what happens if we simply try to subset the neurons:
>>>nl_pr=navis.subset_neuron(nl, subset=lambdax: x.vertices[:, 2] >224000)
>>>nl_pr<class'navis.core.neuronlist.NeuronList'>containing200neurons (1.3GiB)
typenameidunitsn_verticesn_faces0navis.MeshNeuronNone906721dimensionless39859829411navis.MeshNeuronNone736451dimensionless4370788873
.. ... ... ... ... ... ...
198navis.MeshNeuronNone477791dimensionless4806397005199navis.MeshNeuronNone801311dimensionless4691395878>>># Force garbage collection before we measure the memory footprint again>>>importgc>>>gc.collect()
>>>mem_info=psutil.Process(os.getpid()).memory_full_info()
>>>print(f"Resident Set Size: {mem_info.rss/1e9:.2f}Gb")
ResidentSetSize: 9.02Gb>>>print(f"Unique Set Size: {mem_info.uss/1e9:.2f}Gb")
UniqueSetSize: 6.31Gb
The size of the process has exploded to ~9Gb even though the new neuron list is considerably smaller (fewer faces/vertices after pruning). Naively, I would have expected at worst a doubling of the memory usage. So what's happening?
I did a bit of digging and not all operations cause this behavior. For example as simple NeuronList.copy() only doubles the memory footprint as expected. In this particular case, the issue seems to be with trimesh's submesh function which we use under the hood. My best guess at the moment is that subset generates a bunch of temporary data that is correctly garbage collected when the function finishes but the memory is never de-allocated on the system side. The joys of automatic memory management...
The above becomes an annoying problem when processing hundreds or even thousands of meshes. I've had the same subset_mesh procedure crash with around 2k meshes on a 32Gb memory machine. One workaround is to run the function in a child process which ensures that memory is correctly de-allocated when that process terminates:
>>>fromconcurrent.futuresimportProcessPoolExecutor>>>withProcessPoolExecutor(max_workers=1) asexecutor:
... nl_pr= [executor.submit(navis.subset_neuron, n, subset=n.vertices[:, 2] >22400).result() forninnl]
>>>gc.collect()
>>>mem_info=psutil.Process(os.getpid()).memory_full_info()
>>>print(f"Resident Set Size: {mem_info.rss/1e9:.2f}Gb")
ResidentSetSize: 3.95Gb>>>print(f"Unique Set Size: {mem_info.uss/1e9:.2f}Gb")
UniqueSetSize: 0.74Gb
This is obviously a pretty crude example but you can already achieve the same result with subset_neuron(..., parallel=True, n_cores=1).
A few options to deal with this:
Add something (short tutorial?) on this to the docs
Issue a warning when running potentially expensive operations and suggest running them in a child process
Run all or just certain functions by default in a child process
(3) is the nuclear option but (1) and (2) would be pretty straight forward.
The text was updated successfully, but these errors were encountered:
I've been having issue with my OS complaining it was running out of memory while doing some seemingly straight forward processing of neuron meshes in Python.
Consider this example from a fresh Python session in which we have loaded 200 neuron meshes:
Size of the neuron list is somewhere around 1.3Gb which accounts for most of the RSS of the process. The OSX Activity Monitor says the "Real Memory Size" is around 1.8Gb.
Now watch what happens if we simply try to subset the neurons:
The size of the process has exploded to ~9Gb even though the new neuron list is considerably smaller (fewer faces/vertices after pruning). Naively, I would have expected at worst a doubling of the memory usage. So what's happening?
I did a bit of digging and not all operations cause this behavior. For example as simple
NeuronList.copy()
only doubles the memory footprint as expected. In this particular case, the issue seems to be withtrimesh
'ssubmesh
function which we use under the hood. My best guess at the moment is thatsubset
generates a bunch of temporary data that is correctly garbage collected when the function finishes but the memory is never de-allocated on the system side. The joys of automatic memory management...The above becomes an annoying problem when processing hundreds or even thousands of meshes. I've had the same
subset_mesh
procedure crash with around 2k meshes on a 32Gb memory machine. One workaround is to run the function in a child process which ensures that memory is correctly de-allocated when that process terminates:This is obviously a pretty crude example but you can already achieve the same result with
subset_neuron(..., parallel=True, n_cores=1)
.A few options to deal with this:
(3) is the nuclear option but (1) and (2) would be pretty straight forward.
The text was updated successfully, but these errors were encountered: