Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel processing pr #290

Merged
merged 9 commits into from
Jan 27, 2024
Merged

Parallel processing pr #290

merged 9 commits into from
Jan 27, 2024

Conversation

timonmerk
Copy link
Contributor

Adding joblib feature parallelization.

  • Added check if parallelization can be performed in nm_stream_offline
  • Changed class attribute to instance attribute in nm_mne_connectivity
  • Added dependency of joblib
  • Added Tests for equal sequential and parallel computation

@timonmerk
Copy link
Contributor Author

@toni-neurosc I worked on a different branch where I merged some of your commits from #281.

Thanks for adapting the PR!
The model.predict is a basic (not really tested) method to load a ML model and get directly predictions that could be used for real-time applications. It was however badly implemented, and not tested, so I removed it for now.

The real-time computation also one of the reasons why I put the computation in this strange while True loop, s.t. a datastream could replace the nm_generator and constantly feeds data until a "stop signal" is reached.

Optimally this would be replaced by a mne-lsl Stream at some point (#278, https://mne.tools/mne-lsl/stable/generated/tutorials/00_stream.html#sphx-glr-generated-tutorials-00-stream-py).

Regarding the parallelization implementation, I found out that there are some issues with parallelization of certain feature estimation and post-processing methods. Currently normalization for example is implemented sequentially, which means that batches need to be iteratively computed. The same is true for bursts, since there also a time window (of e.g. 30s) has to be defined and then iteratively processed.

I added now a check for those methods in nm_stream_offline and throw an exception if those are selected when parallel is set to True.

Personally I am also a bit more in favor of joblib across OS systems, which simplified the code a bit. So for now I removed the OS check.

Thanks again! Such a great addition to the tool

@timonmerk timonmerk merged commit dcb981b into main Jan 27, 2024
4 checks passed
@toni-neurosc
Copy link
Collaborator

Hi Timon! Thanks for your comments, those are some good points that I did not consider, partly cause the parallelization is one of the first things that I tried to do and did not have a proper full-picture view of the software (still don't have it).

I indeed notice the issue with parallelization while working on nm_bursts, since the circular buffer persists across batches and it's used to update the percentile. There is however a possibility to maintain the parallelization if the update of the buffer and the calculation of percentile is done sequentially, and then the processing of the batch is submitted to the worker pool with a frozen copy of the percentile. This logic could extend to the normalization but I'd have to check.

Of course this requires a little more low-level knowledge of how the sub-processes are being spawn and which data is being copied between processes and which one is just referencing other blocks of memory which could change during execution. For example, if a class variable self.burst_thr is calculated and stored, and then the job is submitted, will that burst_thr be copied into the new process, or when the next batch comes and updates it, will it affect already running processes? I need to research or write a test case for it.

@timonmerk timonmerk deleted the parallel_processing_pr branch February 3, 2024 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants