-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone auto-generated reconstruction GUI #487
Conversation
- pydantic model UI widget creation based on CLI model - does no processing, creates yaml files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @amitabhverma. I think the overall GUI, the "multiple reconstructions" logic, and the input/output paths make sense and are looking good.
I have three comments at this stage which might be useful before running complete CLI calls:
-
When you hit "Build & Run Models" you generate a json with an ID, I/O paths, and the complete .yaml. Do you intend to have a list of each reconstruction in this file? The current behavior seems to only add the first reconstruction in json, and there is only a single output path.
-
Is there anything preventing your json file from being a pydantic model? A major reason we use pydantic is that it is typed and validated early...are there equivalent features in your .json model? I like the design of saving everything to a file, but my concern is that when you save to a file, it will be hand edited then fail ungracefully, or worse...after a long reconstruction.
-
It seems like you read most of the default parameters from the examples (which is great), but the regularization strengths seem to be set to zero (because of coarse resolution from the GUI elements?), which leads to non-validating defaults. I'm sure this is on your radar (and it's great that the validator flags this!), but it'll be great to have working defaults to iterate quickly.
Thanks for the feedback @talonchandler
An additional comment to this I made in the code was the root level validation compared to value. For example if a user tries to add two modes that cannot be combined I could check using the same validation and restrict that. Currently you can add a mode Bire+FL as a Model only to be restricted later down the process. Point 3 will fix this part though and can be implemented. |
- refactored model to GUI implementation - added table to hold processing info, entries purge on completion - using click testing for now in prototyping - creates yaml to GUI - using napari notifications for msgs - testing on Windows
- Multi processing using pydantic models - Implemented a client/server approach for managing jobs queue - Updates each job status based on logs - Implemented Unique ID in CLI to associate Jobs with Process - Tested on Windows with locally modified submitit
Thanks for the demo earlier @amitabhverma! On my first trial running on my macos machine, I ran into issues whenever I created or added a model. For example, when I added the Traceback
Trying on linux now. |
- import OS specific pydantic imports for ModelMetaclass
- pydantic possibly needs to be pinned at 1.10.19 @talonchandler - implemented ver specific import for now
- check for update on the job file, if not - notify user in process table and exit after a 2 min timeout
Working on macOS now. Thanks for the fix @amitabhverma. My testing just now was very successful. I think we're moving into the refinement stage, and I'm noting several minor issues:
|
- fixes newly created yaml files to reside besides output zarrs - fixes Model numbers not correctly representing on Build & Run button - fixes "reconstruction algorithm" to show up as dropdown box - fixes "regularization strength" spinbox to accept 5 significant decimal place values - background path now accepts Path and str in model and shows up as directory selector button - each model container setting is now a collapsible item - fixes issues when widget is closed/hidden and then started again - all model(s) validation errors are not collected and presented in notification and highlighted on GUI in one go - each collapsible model entry also highlights if any validation errors are present - added confirm dialog for certain actions - added clear results button - job update thread will exit after a timeout limit and no update detected in job output file
@talonchandler I've committed with all the fixes and other points you made. I see in the test-and-deploy run it uses pydantic==1.10.17. Should I go ahead and modify that in the setup.cfg file to use 1.10.19 ? I'll shortly be checking submitit compatibility. |
Yes please! We try to use the automated tests to flag installation issues, so this is very helpful. |
- pinning pydantic lib ver to 1.10.19
use PyQt6 for pyqtSignal
- fixes standalone widget
- fixes BG selection and resetting if Path is invalid - fixes removal of row/stability - using Queued List to counter race conditions - added .zarr data validation check for Input, also used for BG - make uniqueID more unique - break thread update operations when user has cleared the results table - standalone GUI niceties
another check if result row is deleted by user action while processing
P.S. the tests fail because pytest stalls on a call to |
- timeout of 5 min per Job - BF button error handling when not available - GUI initiates clearing logs on first run to avoid file exist errors - a single processing run now supports the CLI spawning multiple jobs
Thank you @ieivanov, those are some very constructive and detailed feedback and will be very helpful moving forward. Some of the points you raised were outside the scope of this PR but nonetheless very valid. I will try and answer some of the points in the first pass and will leave some for @talonchandler to fill in. The idea of this Reconstruction GUI is also to work as a stand-alone, be agnostic of the acquisition and compliment the CLI and will be able to come up when run via the command-line
Thanks again for your time and thoughts on this. |
- stand-alone GUI cmd - refactored UI based on suggestions - fixes cyclic import for stand-alone GUI - duplicates prev model settings when available - clears Model when changing Input datasets - load model looks for .yml files
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #487 +/- ##
=========================================
+ Coverage 9.54% 10.68% +1.14%
=========================================
Files 30 41 +11
Lines 4591 7275 +2684
=========================================
+ Hits 438 777 +339
- Misses 4153 6498 +2345 ☔ View full report in Codecov by Sentry. |
implemented a Stop method for On-The-Fly polling reconstructions if required
Looking very good @amitabhverma. I just completed a set of tests that I'll summarize here: All tests on Biohub HPC (linux):
I spotted a few other issues:
|
Thanks @talonchandler for your comprehensive testing and detailed summary.
This should be a single submitit job since there were no positions involved and not an on-the-fly reconstruction. I am assuming it was going through the time-points and then choked at t=7 due to resources. I have set the
This seems an issue related to the HPC since I was unable to replicate on my local system. Possible reason could be access to the submitit log files or communication over the port we have designated for use. I will check on the HPC and get back.
Seems like the same update issue with file/port access on the HPC.
I will remove the line widget and that should make the scrollbar at the end of the container more apparent for each model.
As discussed I don't see any bottlenecks occurring on the acquisition side or on the reconstruction. I will however change the script and reduced the update calls to write after every z dimension instead of every channel. |
- added info icon next to Input Store label when a dataset path is set which displays channel names for convenience, more metadata could be displayed that might be relevant for reconstruction - removed line widget at end of each model container to make scrollbar more apparent
- we are only reading the location here but it needs to exist, the CLI is creating the logs based on the path
- script to test on-the-fly reconstruction POC
@talonchandler following up on the pending items on your last post here.
Tested with both datasets and they ran and updated successfully. Issue was reading the logs folder which was not existing. The logs folder is created in the cwd from where napari is started. I would suggest setting it based on the output directory we specify. The CLI could do something similar too. It would need a little re-work on the jobs management side to make sure multiple jobs with different parent output directories are handled correctly for updating the results window which reads in the logs.
This got fixed with the logs issue but I do see a recurring error (pasted below). The error interrupted the on-the-fly reconstruction once out of my 5 runs. We might need inputs from IT about this, I see it mentioned frequently on the nomachine forum too.
A simplified version of that script has been added to the |
@talonchandler I tested the As a follow up I ran the same script via ssh directly (without nomachine or noVNC) on the same node and did not see the error. |
- logs folder will reside next to the output dataset - fixed an issue with reconstructed data showing irrespective of selection
Looking good @amitabhverma! I love the (i) info that shows the available metadata---extremely useful. I went through the same tests as last time and here are my results:
I had an unknown failure after 11 timepoints...similar outward appearance as last time. I've attached the logs, but they didn't seem to suggest much...can you take a look? You're able to reconstruct all 30 timepoints? I also checked the job efficiency on the HPC and found no evidence of OOM. a549_tomm20_lysotracker_w1_labelfree_1_Bir_01_logs.zip
This time I errored out with the attached traceback, which I'm having difficulty understanding. I tried restarting napari, reconstructing a different channel, and reconstructing to a different output directory---all three resulted in the same output. Edit:
Running and reconstructing the fake-acquisition copying script worked as expected. |
fixes: - multi-pos dataset would be displayed after single pos processing enhancements: - CLI will print Job status when used as cmd line, not for GUI - use single socket connection when multi-pos is spawned by a request - added "rx" field to model-container - minor GUI tweaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my manual tests are passing. My issues this week were related to the (new) restriction on the /mydata/
folders...thanks for your help and patience.
I gave all of the code a quick pass, and I left relatively minor comments. We use black
for formatting (details here), and that should help automatically address some of the nits. We've been meaning to start automatically checking for formatting, but we haven't gotten to it.
After these are addressed, I will approve and merge.
Looks great @amitabhverma! Just ran another quick test, and everything is working well. Merging. |
Quick guide to install the test version:
Create a new conda environment and activate it
Install current main branch with the [acq] optional flag
run napari to test
napari
If you get an error you might need to install napari with its Qt bindings:
python -m pip install "napari[pyqt6]"
Make sure napari runs now. If you want to test the current main branch version, you can test that from the plugins menu. Close napari.
We will now install the version for testing from the branch using:
pip install git+https://github.com/mehta-lab/recOrder.git@pydantic-model-prototype
If you are on a Windows machine, we will need a different submitit package (ignore for other OS)
Quick guide on Reconstruction:
Once each process successfully completes that table entry will be purged and only those that might have errored should remain.
Note
I would suggest for the first test, run using 1 model and make sure no errors are encountered and the data processes as expected. If you are running on a local machine I also suggest keeping the task manager open to make sure the OS is not running out of resources which might lead to a crash when running multiple processing concurrently.