Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse_mean_variance_axis now uses all cores #3015

Merged
merged 12 commits into from
Apr 23, 2024
Merged

Conversation

Intron7
Copy link
Member

@Intron7 Intron7 commented Apr 19, 2024

This functions now uses all cores for mean & var calculations.

@Intron7 Intron7 added this to the 1.10.2 milestone Apr 19, 2024
@Intron7 Intron7 linked an issue Apr 19, 2024 that may be closed by this pull request
@Intron7 Intron7 requested a review from ivirshup April 19, 2024 08:25
Copy link

codecov bot commented Apr 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.27%. Comparing base (ee8505b) to head (d982936).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3015      +/-   ##
==========================================
+ Coverage   75.53%   76.27%   +0.74%     
==========================================
  Files         117      117              
  Lines       12950    12795     -155     
==========================================
- Hits         9782     9760      -22     
+ Misses       3168     3035     -133     
Files Coverage Δ
scanpy/preprocessing/_utils.py 97.36% <100.00%> (+42.70%) ⬆️

... and 4 files with indirect coverage changes

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool!

Please use longer variable names. nthrnum_threads, ssums_minor or so, s0→???, m0→???

Of course not for things like loop counters, but you know.

Also would be great to have comments, like “calculate sums and sum of squares along the minor axis” (if “minor” is correct here) then “go over minor axis again to calculate means and variances from the sums”.

PS: Benchmarks should be ready in a little bit, so you could add one to check how much this speeds things up! I’ll keep you posted

@Intron7 Intron7 requested a review from flying-sheep April 19, 2024 08:51
@Intron7
Copy link
Member Author

Intron7 commented Apr 19, 2024

Some small benchmarks for 32 cores with CSR.shape=(196943, 20867):

axis old new
minor 804 ms 96 ms
major 520 ms 40 ms

Copy link

scverse-benchmark bot commented Apr 19, 2024

Benchmark changes

Change Before [ee8505b] After [d982936] Ratio Benchmark (Parameter)
- 508±2ms 31.9±1ms 0.06 preprocessing.SparseDenseSuite.time_mean_var('lung93k')
+ 1.09±0.04ms 1.22±0.04ms 1.12 preprocessing.SparseDenseSuite.time_mean_var('pbmc68k_reduced')
+ 241M 330M 1.37 preprocessing.peakmem_pca
+ 5.86±0.01ms 6.86±0.03ms 1.17 preprocessing.time_calculate_qc_metrics

Comparison: https://github.com/scverse/scanpy/compare/ee8505b1c1578af0c50defdb3cf64ec18713669e..d9829365d7e23aea5680990ea8570d0a384291d3
Last changed: Tue, 23 Apr 2024 09:27:27 +0000

More details: https://github.com/scverse/scanpy/pull/3015/checks?check_run_id=24144155267

@flying-sheep
Copy link
Member

flying-sheep commented Apr 19, 2024

I tentatively added a benchmark that runs just on _get_mean_var.

Locally I don’t see any difference though, what’s wrong? Too small data? Numba not set up with correct number of threads?

/edit: also I think the machine is not sufficiently tuned. The original run (before I added the mean_var benchmarks) said “No changes in benchmarks.”

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Would be nice to add a sufficiently large dataset to the benchmarks to demonstrate the benefits.

@flying-sheep flying-sheep merged commit a70582e into main Apr 23, 2024
15 checks passed
@flying-sheep flying-sheep deleted the _mean_var-sparse-mc branch April 23, 2024 15:00
meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request Apr 23, 2024
flying-sheep pushed a commit that referenced this pull request Apr 23, 2024
…ses all cores) (#3024)

Co-authored-by: Severin Dicks <37635888+Intron7@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Preprocessing functions with numba
2 participants