release: v2.6.0 (#266)

release: v2.6.0
eonu · Dec 30, 2024 · 37ce9f4 · 37ce9f4
2 parents a54dcdb + a769620
commit 37ce9f4
Show file tree

Hide file tree

Showing 33 changed files with 273 additions and 413 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -11,13 +11,13 @@ repos:
         pass_filenames: false
   # ruff check (w/autofix)
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.1.3 # should match version in pyproject.toml
+    rev: v0.8.4 # should match version in pyproject.toml
     hooks:
       - id: ruff
         args: [--fix, --exit-non-zero-on-fix]
   # ruff format
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.1.3 # should match version in pyproject.toml
+    rev: v0.8.4 # should match version in pyproject.toml
     hooks:
       - id: ruff-format
   # # pydoclint - docstring formatting

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -388,11 +388,17 @@ Nothing, initial release!
 
 </details>
 
-## [v2.5.0](https://github.com/eonu/sequentia/releases/tag/v2.5.0) - 2024-12-27
+## [v2.6.0](https://github.com/eonu/sequentia/releases/tag/v2.6.0) - 2024-12-30
+
+### Bug Fixes
+
+- enable `joblib.Parallel` memory mapping ([#262](https://github.com/eonu/sequentia/issues/262))
 
 ### Documentation
 
 - update copyright notice ([#255](https://github.com/eonu/sequentia/issues/255))
+- fix `KNNRegressor.window` docstring typo ([#261](https://github.com/eonu/sequentia/issues/261))
+- update `README.md` features ([#265](https://github.com/eonu/sequentia/issues/265))
 
 ### Features
 
@@ -402,6 +408,11 @@ Nothing, initial release!
 - add `model_selection` sub-package for hyper-parameters ([#257](https://github.com/eonu/sequentia/issues/257))
 - add model spec support to `HMMClassifier.__init__` ([#258](https://github.com/eonu/sequentia/issues/258))
 - add `HMMClassifier.fit` multiprocessing ([#259](https://github.com/eonu/sequentia/issues/259))
+- set `use_c=True` by default for `KNNClassifier`/`KNNRegressor` ([#263](https://github.com/eonu/sequentia/issues/263))
+
+### Styling
+
+- upgrade to `ruff` v0.8.4 and fix type hints ([#264](https://github.com/eonu/sequentia/issues/264))
 
 ## [v2.0.2](https://github.com/eonu/sequentia/releases/tag/v2.0.2) - 2024-04-13
 

diff --git a/README.md b/README.md
@@ -58,6 +58,8 @@ Some examples of how Sequentia can be used on sequence data include:
 
 - **Simplicity and interpretability**: Sequentia offers a limited set of machine learning algorithms, chosen specifically to be more interpretable and easier to configure than more complex alternatives such as recurrent neural networks and transformers, while maintaining a high level of effectiveness.
 - **Familiar and user-friendly**: To fit more seamlessly into the workflow of data science practitioners, Sequentia follows the ubiquitous Scikit-Learn API, providing a familiar model development process for many, as well as enabling wider access to the rapidly growing Scikit-Learn ecosystem.
+- **Speed**: Some algorithms offered by Sequentia naturally have restrictive runtime scaling, such as k-nearest neighbors. However, our implementation is 
+optimized to the point of being multiple orders of magnitude faster than similar packages — see the [Benchmarks](#benchmarks) section for more information.
 
 ## Build Status
 
@@ -82,7 +84,7 @@ effective inference algorithm.
 - [x] Sakoe–Chiba band global warping constraint
 - [x] Dependent and independent feature warping (DTWD/DTWI)
 - [x] Custom distance-weighted predictions
-- [x] Multi-processed predictions
+- [x] Multi-processed prediction
 
 #### [Hidden Markov Models](https://sequentia.readthedocs.io/en/latest/sections/models/hmm/index.html) (via [`hmmlearn`](https://github.com/hmmlearn/hmmlearn))
 
@@ -99,7 +101,7 @@ based on the provided training sequence data.
 - [x] Multivariate real-valued observations (modeled with Gaussian mixture emissions)
 - [x] Univariate categorical observations (modeled with discrete emissions)
 - [x] Linear, left-right and ergodic topologies
-- [x] Multi-processed predictions
+- [x] Multi-processed training and prediction
 
 ### Scikit-Learn compatibility
 
@@ -157,7 +159,7 @@ All of the above libraries support multiprocessing, and prediction was performed
 <img src="benchmarks/benchmark.svg" width="100%"/>
 
 > **Device information**:
-> - Product: ThinkPad T14s (Gen 6)
+> - Product: Lenovo ThinkPad T14s (Gen 6)
 > - Processor: AMD Ryzen™ AI 7 PRO 360 (8 cores, 16 threads, 2-5GHz)
 > - Memory: 64 GB LPDDR5X-7500MHz
 > - Solid State Drive: 1 TB SSD M.2 2280 PCIe Gen4 Performance TLC Opal 
@@ -175,7 +177,7 @@ pip install sequentia
 
 For optimal performance when using any of the k-NN based models, it is important that the correct `dtaidistance` C libraries are accessible.
 
-Please see the [`dtaidistance` installation guide](https://dtaidistance.readthedocs.io/en/latest/usage/installation.html) for troubleshooting if you run into C compilation issues, or if setting `use_c=True` on k-NN based models results in a warning.
+Please see the [`dtaidistance` installation guide](https://dtaidistance.readthedocs.io/en/latest/usage/installation.html) for troubleshooting if you run into C compilation issues, or if using k-NN based models with `use_c=True` results in a warning.
 
 You can use the following to check if the appropriate C libraries are available.
 
@@ -184,6 +186,8 @@ from dtaidistance import dtw
 dtw.try_import_c()
 ```
 
+If these libraries are unavailable, Sequentia will fall back to using a Python alternative.
+
 ### Development
 
 Please see the [contribution guidelines](/CONTRIBUTING.md) to see installation instructions for contributing to Sequentia.

diff --git a/benchmarks/plot.ipynb b/benchmarks/plot.ipynb
@@ -8,24 +8,13 @@
    "outputs": [],
    "source": [
     "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
     "\n",
     "plt.style.use(\"ggplot\")"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c92bf960-ddb5-409f-bd3c-5bce0a03ccd0",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sequentia import"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 79,
    "id": "6649bf2d-7430-401d-8113-f3c1e1cf4779",
    "metadata": {},
    "outputs": [
@@ -48,23 +37,36 @@
     "\n",
     "bars = ax.bar(labels, runtimes, width=0.5, color=\"C1\")\n",
     "ax.set(xlabel=\"Package\", ylabel=\"Runtime (s)\")\n",
-    "ax.set_title(\"Univariate DTW-kNN performance (1,500 FSDD train/test sequences, 16 workers)\", fontsize=11)\n",
+    "ax.set_title(\n",
+    "    (\n",
+    "        \"Univariate DTW-kNN performance \"\n",
+    "        \"(1,500 FSDD train/test sequences, 16 workers)\"\n",
+    "    ),\n",
+    "    fontsize=11,\n",
+    ")\n",
+    "\n",
     "\n",
     "def fmt(s: float) -> str:\n",
+    "    \"\"\"Formats the runtime.\"\"\"\n",
     "    if s < 60:\n",
     "        return f\"{round(s)}s\"\n",
     "    m, s = divmod(s, 60)\n",
     "    return f\"{round(m)}m {round(s)}s\"\n",
     "\n",
+    "\n",
     "for bar in bars:\n",
     "    plt.text(\n",
-    "        bar.get_x() + bar.get_width() / 2, bar.get_height(),\n",
-    "        fmt(bar.get_height()), ha='center', va='bottom', fontsize=9,\n",
+    "        bar.get_x() + bar.get_width() / 2,\n",
+    "        bar.get_height(),\n",
+    "        fmt(bar.get_height()),\n",
+    "        ha=\"center\",\n",
+    "        va=\"bottom\",\n",
+    "        fontsize=9,\n",
     "    )\n",
     "\n",
     "for lab in ax.get_xticklabels():\n",
-    "   if lab.get_text() == \"sequentia\":\n",
-    "      lab.set_fontweight('bold')\n",
+    "    if lab.get_text() == \"sequentia\":\n",
+    "        lab.set_fontweight(\"bold\")\n",
     "\n",
     "plt.tight_layout()\n",
     "plt.savefig(\"benchmark.svg\")\n",

diff --git a/benchmarks/test_pyts.py b/benchmarks/test_pyts.py
@@ -34,9 +34,7 @@ def prepare(data: SequentialDataset, length: int) -> DataSplit:
     return X_pad[:, 0], data.y
 
 
-def multivariate(
-    *, train_data: DataSplit, test_data: DataSplit, n_jobs: int
-) -> None:
+def run(*, train_data: DataSplit, test_data: DataSplit, n_jobs: int) -> None:
     """Fit and predict the classifier."""
     # initialize model
     clf = KNeighborsClassifier(
@@ -70,7 +68,7 @@ def multivariate(
     )
 
     benchmark = timeit.timeit(
-        "func(train_data=train_data, test_data=test_data, n_jobs=args.n_jobs)",
+        "run(train_data=train_data, test_data=test_data, n_jobs=args.n_jobs)",
         globals=locals(),
         number=args.number,
     )

diff --git a/benchmarks/test_sequentia.py b/benchmarks/test_sequentia.py
@@ -21,7 +21,7 @@
 random_state: np.random.RandomState = np.random.RandomState(0)
 
 
-def multivariate(
+def run(
     *, train_data: SequentialDataset, test_data: SequentialDataset, n_jobs: int
 ) -> None:
     """Fit and predict the classifier."""
@@ -52,7 +52,7 @@ def multivariate(
     train_data, test_data = load_dataset(multivariate=False)
 
     benchmark = timeit.timeit(
-        "func(train_data=train_data, test_data=test_data, n_jobs=args.n_jobs)",
+        "run(train_data=train_data, test_data=test_data, n_jobs=args.n_jobs)",
         globals=locals(),
         number=args.number,
     )

diff --git a/benchmarks/test_sktime.py b/benchmarks/test_sktime.py
@@ -56,9 +56,7 @@ def prepare(data: SequentialDataset) -> DataSplit:
     return X_pd, data.y
 
 
-def multivariate(
-    *, train_data: DataSplit, test_data: DataSplit, n_jobs: int
-) -> None:
+def run(*, train_data: DataSplit, test_data: DataSplit, n_jobs: int) -> None:
     """Fit and predict the classifier."""
     # initialize model
     clf = KNeighborsTimeSeriesClassifier(
@@ -89,7 +87,7 @@ def multivariate(
     train_data, test_data = prepare(train_data), prepare(test_data)
 
     benchmark = timeit.timeit(
-        "func(train_data=train_data, test_data=test_data, n_jobs=args.n_jobs)",
+        "run(train_data=train_data, test_data=test_data, n_jobs=args.n_jobs)",
         globals=locals(),
         number=args.number,
     )

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -23,7 +23,7 @@
 project = "sequentia"
 copyright = "2019, Sequentia Developers"  # noqa: A001
 author = "Edwin Onuonga (eonu)"
-release = "2.5.0"
+release = "2.6.0"
 
 # -- General configuration ---------------------------------------------------
 

diff --git a/make/lint.py b/make/lint.py
@@ -33,7 +33,7 @@ def check(c: Config) -> None:
 def format_(c: Config) -> None:
     """Format Python files."""
     commands: list[str] = [
-        "poetry run ruff --fix .",
+        "poetry run ruff check --fix .",
         "poetry run ruff format .",
     ]
     for command in commands:

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "sequentia"
-version = "2.5.0"
+version = "2.6.0"
 license = "MIT"
 authors = ["Edwin Onuonga <ed@eonu.net>"]
 maintainers = ["Edwin Onuonga <ed@eonu.net>"]
@@ -86,7 +86,7 @@ tox = "4.11.3"
 pre-commit = ">=3"
 
 [tool.poetry.group.lint.dependencies]
-ruff = "0.1.3"
+ruff = "0.8.4"
 pydoclint = "0.3.8"
 
 [tool.poetry.group.docs.dependencies]
@@ -100,8 +100,8 @@ pytest = { version = "^7.4.0" }
 pytest-cov = { version = "^4.1.0" }
 
 [tool.ruff]
-required-version = "0.1.3"
-select = [
+required-version = "0.8.4"
+lint.select = [
     "F",    # pyflakes: https://pypi.org/project/pyflakes/
     "E",    # pycodestyle (error): https://pypi.org/project/pycodestyle/
     "W",    # pycodestyle (warning): https://pypi.org/project/pycodestyle/
@@ -144,7 +144,7 @@ select = [
     "PERF", # perflint: https://pypi.org/project/perflint/
     "RUF",  # ruff
 ]
-ignore = [
+lint.ignore = [
     "ANN401",  # https://beta.ruff.rs/docs/rules/any-type/
     "B905",    # https://beta.ruff.rs/docs/rules/zip-without-explicit-strict/
     "TD003",   # https://beta.ruff.rs/docs/rules/missing-todo-link/
@@ -162,16 +162,15 @@ ignore = [
     "C408",    # Unnecessary `dict` call (rewrite as a literal)
     "D401",    # First line of docstring should be in imperative mood
 ]
-ignore-init-module-imports = true # allow unused imports in __init__.py
 line-length = 79
 
-[tool.ruff.pydocstyle]
+[tool.ruff.lint.pydocstyle]
 convention = "numpy"
 
-[tool.ruff.flake8-annotations]
+[tool.ruff.lint.flake8-annotations]
 allow-star-arg-any = true
 
-[tool.ruff.extend-per-file-ignores]
+[tool.ruff.lint.extend-per-file-ignores]
 "__init__.py" = ["PLC0414", "F403", "F401", "F405"]
 "sequentia/datasets/*.py" = ["B006"]
 "sequentia/enums.py" = ["E501"]