Skip to content

Commit

Permalink
Add benchmarks (#2977)
Browse files Browse the repository at this point in the history
Co-authored-by: Rahul Shrestha <rahulshrestha0101@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
  • Loading branch information
4 people authored Apr 19, 2024
1 parent 4f6e690 commit ea66317
Show file tree
Hide file tree
Showing 13 changed files with 436 additions and 16 deletions.
59 changes: 59 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Benchmark

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
benchmark:
runs-on: ${{ matrix.os }}
defaults:
run:
shell: bash -e {0} # -e to fail on error

strategy:
fail-fast: false
matrix:
python: ["3.12"]
os: [ubuntu-latest]

env:
OS: ${{ matrix.os }}
PYTHON: ${{ matrix.python }}
ASV_DIR: "./benchmarks"

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
filter: blob:none

- name: Fetch main branch for `asv run`’s hash
run: git fetch origin main:main
if: ${{ github.ref_name != 'main' }}

- name: Set up Python ${{ matrix.python }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}
cache: 'pip'

- name: Cache datasets
uses: actions/cache@v4
with:
path: |
~/.cache
key: benchmark-state-${{ hashFiles('benchmarks/**') }}

- name: Install dependencies
run: pip install asv

- name: Configure ASV
working-directory: ${{ env.ASV_DIR }}
run: asv machine --yes

- name: Quick benchmark run
working-directory: ${{ env.ASV_DIR }}
run: asv run --dry-run --quick --show-stderr --verbose HEAD^!
5 changes: 4 additions & 1 deletion .github/workflows/check-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ jobs:
needs: check-milestone
if: ${{ needs.check-milestone.outputs.no-relnotes-reason == '' && !contains(github.event.pull_request.labels.*.name, 'Development Process 🚀') }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
filter: blob:none
- name: Find out if relevant release notes are modified
uses: dorny/paths-filter@v2
id: changes
Expand Down
7 changes: 5 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@ jobs:
permissions:
id-token: write # to authenticate as Trusted Publisher to pypi.org
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
with:
fetch-depth: 0
filter: blob:none
- uses: actions/setup-python@v5
with:
python-version: "3.x"
cache: "pip"
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,7 @@ Thumbs.db
# IDEs and editors
/.idea/
/.vscode/

# asv benchmark files
/benchmarks/.asv
/benchmarks/data/
10 changes: 10 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Scanpy Benchmarks

This directory contains code for benchmarking Scanpy using [asv][].

The functionality is checked using the [`benchmark.yml`][] workflow.
Benchmarks are run using the [benchmark bot][].

[asv]: https://asv.readthedocs.io/
[`benchmark.yml`]: ../.github/workflows/benchmark.yml
[benchmark bot]: https://github.com/apps/scverse-benchmark
168 changes: 168 additions & 0 deletions benchmarks/asv.conf.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
{
// The version of the config file format. Do not change, unless
// you know what you are doing.
"version": 1,

// The name of the project being benchmarked
"project": "scanpy",

// The project's homepage
"project_url": "https://scanpy.readthedocs.io/",

// The URL or local path of the source code repository for the
// project being benchmarked
"repo": "..",

// The Python project's subdirectory in your repo. If missing or
// the empty string, the project is assumed to be located at the root
// of the repository.
// "repo_subdir": "",

// Customizable commands for building, installing, and
// uninstalling the project. See asv.conf.json documentation.
//
// "install_command": ["python -mpip install {wheel_file}"],
// "uninstall_command": ["return-code=any python -mpip uninstall -y {project}"],
"build_command": [
"python -m pip install build",
"python -m build --wheel -o {build_cache_dir} {build_dir}",
],

// List of branches to benchmark. If not provided, defaults to "master"
// (for git) or "default" (for mercurial).
"branches": ["main"], // for git

// The DVCS being used. If not set, it will be automatically
// determined from "repo" by looking at the protocol in the URL
// (if remote), or by looking for special directories, such as
// ".git" (if local).
"dvcs": "git",

// The tool to use to create environments. May be "conda",
// "virtualenv" or other value depending on the plugins in use.
// If missing or the empty string, the tool will be automatically
// determined by looking for tools on the PATH environment
// variable.
"environment_type": "conda",

// timeout in seconds for installing any dependencies in environment
// defaults to 10 min
//"install_timeout": 600,

// the base URL to show a commit for the project.
"show_commit_url": "https://github.com/scverse/scanpy/commit/",

// The Pythons you'd like to test against. If not provided, defaults
// to the current version of Python used to run `asv`.
// "pythons": ["3.9", "3.12"],

// The list of conda channel names to be searched for benchmark
// dependency packages in the specified order
"conda_channels": ["conda-forge", "defaults"],

// The matrix of dependencies to test. Each key is the name of a
// package (in PyPI) and the values are version numbers. An empty
// list or empty string indicates to just test against the default
// (latest) version. null indicates that the package is to not be
// installed. If the package to be tested is only available from
// PyPi, and the 'environment_type' is conda, then you can preface
// the package name by 'pip+', and the package will be installed via
// pip (with all the conda available packages installed first,
// followed by the pip installed packages).
//
"matrix": {
"numpy": [""],
// "scipy": ["1.2", ""],
"scipy": [""],
"h5py": [""],
"natsort": [""],
"pandas": [""],
"memory_profiler": [""],
"zarr": [""],
"pytest": [""],
"scanpy": [""],
"python-igraph": [""],
// "psutil": [""]
},

// Combinations of libraries/python versions can be excluded/included
// from the set to test. Each entry is a dictionary containing additional
// key-value pairs to include/exclude.
//
// An exclude entry excludes entries where all values match. The
// values are regexps that should match the whole string.
//
// An include entry adds an environment. Only the packages listed
// are installed. The 'python' key is required. The exclude rules
// do not apply to includes.
//
// In addition to package names, the following keys are available:
//
// - python
// Python version, as in the *pythons* variable above.
// - environment_type
// Environment type, as above.
// - sys_platform
// Platform, as in sys.platform. Possible values for the common
// cases: 'linux2', 'win32', 'cygwin', 'darwin'.
//
// "exclude": [
// {"python": "3.2", "sys_platform": "win32"}, // skip py3.2 on windows
// {"environment_type": "conda", "six": null}, // don't run without six on conda
// ],
//
// "include": [
// // additional env for python2.7
// {"python": "2.7", "numpy": "1.8"},
// // additional env if run on windows+conda
// {"platform": "win32", "environment_type": "conda", "python": "2.7", "libpython": ""},
// ],

// The directory (relative to the current directory) that benchmarks are
// stored in. If not provided, defaults to "benchmarks"
// "benchmark_dir": "benchmarks",

// The directory (relative to the current directory) to cache the Python
// environments in. If not provided, defaults to "env"
"env_dir": ".asv/env",

// The directory (relative to the current directory) that raw benchmark
// results are stored in. If not provided, defaults to "results".
"results_dir": ".asv/results",

// The directory (relative to the current directory) that the html tree
// should be written to. If not provided, defaults to "html".
"html_dir": ".asv/html",

// The number of characters to retain in the commit hashes.
// "hash_length": 8,

// `asv` will cache results of the recent builds in each
// environment, making them faster to install next time. This is
// the number of builds to keep, per environment.
// "build_cache_size": 2,

// The commits after which the regression search in `asv publish`
// should start looking for regressions. Dictionary whose keys are
// regexps matching to benchmark names, and values corresponding to
// the commit (exclusive) after which to start looking for
// regressions. The default is to start from the first commit
// with results. If the commit is `null`, regression detection is
// skipped for the matching benchmark.
//
// "regressions_first_commits": {
// "some_benchmark": "352cdf", // Consider regressions only after this commit
// "another_benchmark": null, // Skip regression detection altogether
// },

// The thresholds for relative change in results, after which `asv
// publish` starts reporting regressions. Dictionary of the same
// form as in ``regressions_first_commits``, with values
// indicating the thresholds. If multiple entries match, the
// maximum is taken. If no entry matches, the default is 5%.
//
// "regressions_thresholds": {
// "some_benchmark": 0.01, // Threshold of 1%
// "another_benchmark": 0.5, // Threshold of 50%
// },
}
Empty file.
101 changes: 101 additions & 0 deletions benchmarks/benchmarks/preprocessing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
"""
This module will benchmark preprocessing operations in Scanpy
API documentation: https://scanpy.readthedocs.io/en/stable/api/preprocessing.html
"""

from __future__ import annotations

from typing import TYPE_CHECKING

import scanpy as sc

from .utils import pbmc68k_reduced

if TYPE_CHECKING:
from anndata import AnnData


adata: AnnData


def setup():
global adata
adata = pbmc68k_reduced()


def time_calculate_qc_metrics():
adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(
adata, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True
)


def peakmem_calculate_qc_metrics():
adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(
adata, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True
)


def time_filter_cells():
sc.pp.filter_cells(adata, min_genes=200)


def peakmem_filter_cells():
sc.pp.filter_cells(adata, min_genes=200)


def time_filter_genes():
sc.pp.filter_genes(adata, min_cells=3)


def peakmem_filter_genes():
sc.pp.filter_genes(adata, min_cells=3)


def time_normalize_total():
sc.pp.normalize_total(adata, target_sum=1e4)


def peakmem_normalize_total():
sc.pp.normalize_total(adata, target_sum=1e4)


def time_log1p():
sc.pp.log1p(adata)


def peakmem_time_log1p():
sc.pp.log1p(adata)


def time_pca():
sc.pp.pca(adata, svd_solver="arpack")


def peakmem_pca():
sc.pp.pca(adata, svd_solver="arpack")


def time_highly_variable_genes():
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)


def peakmem_highly_variable_genes():
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)


def time_regress_out():
sc.pp.regress_out(adata, ["n_counts", "percent_mito"])


def peakmem_regress_out():
sc.pp.regress_out(adata, ["n_counts", "percent_mito"])


def time_scale():
sc.pp.scale(adata, max_value=10)


def peakmem_scale():
sc.pp.scale(adata, max_value=10)
Loading

0 comments on commit ea66317

Please sign in to comment.