Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Pydantic models for component configurations #596

Merged
merged 50 commits into from
Jan 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
0d623e3
working baseline for BaseModel pipleine config
mdekstrand Jan 8, 2025
3536683
fix remaining pipeline tests
mdekstrand Jan 8, 2025
aed90bf
Clean up Component typing
mdekstrand Jan 8, 2025
eea4622
simplify declaration of component
mdekstrand Jan 8, 2025
0fd2236
improve docs on simplified configuration
mdekstrand Jan 8, 2025
c42268c
remove configurable doc
mdekstrand Jan 8, 2025
c134dbd
auto-generate lenskit type docs
mdekstrand Jan 8, 2025
328c76c
correct example code syntax in component
mdekstrand Jan 8, 2025
bd3b0a2
more pipeline writing-
mdekstrand Jan 8, 2025
67a0be4
working example inclusion and tests
mdekstrand Jan 8, 2025
bb149b1
use Sphinx to run documentation examples
mdekstrand Jan 8, 2025
0cde2be
add note on motivation of pipeilne
mdekstrand Jan 9, 2025
d051761
add warning for component instantiation
mdekstrand Jan 9, 2025
e7cc086
write bias config class
mdekstrand Jan 9, 2025
b7dd0f0
fix component __init_superclass__
mdekstrand Jan 9, 2025
f95c2e5
improve component error checking
mdekstrand Jan 9, 2025
36a29c9
upgrade component tests for new component config
mdekstrand Jan 9, 2025
33a7705
upgrade bias to new component config
mdekstrand Jan 9, 2025
100a36b
improve warning setup
mdekstrand Jan 9, 2025
83f9d74
add explicit no-config annotations to basic components
mdekstrand Jan 9, 2025
7f2add4
type-check configuration objecets
mdekstrand Jan 9, 2025
3a1970c
upgrade popularity and top-N components
mdekstrand Jan 9, 2025
ddaa59d
migrate random to new config
mdekstrand Jan 9, 2025
5b1e30d
push serializers into derivable seed type
mdekstrand Jan 9, 2025
daf1cdc
string motivation
mdekstrand Jan 9, 2025
6940ca3
upgrade user/item knn
mdekstrand Jan 9, 2025
c042ca4
make validation stricter
mdekstrand Jan 9, 2025
e8fe2ac
move random seeds
mdekstrand Jan 10, 2025
c14e5d3
add initial val support to lenskit get_logger
mdekstrand Jan 10, 2025
c40d26b
export damping from lenskit.basic
mdekstrand Jan 10, 2025
0202081
add UIPair type
mdekstrand Jan 10, 2025
13fe46d
most of ALS upgrade to config
mdekstrand Jan 10, 2025
257d72f
move lenskit.util.random to lenskit.random
mdekstrand Jan 10, 2025
6b95136
simplify configurable and derived seeds
mdekstrand Jan 10, 2025
ef689d4
Revert "simplify configurable and derived seeds"
mdekstrand Jan 10, 2025
3dd3979
Reapply "simplify configurable and derived seeds"
mdekstrand Jan 10, 2025
4562ba5
finish fixing ALS configuration support
mdekstrand Jan 10, 2025
9aa4429
fix test use of components
mdekstrand Jan 10, 2025
77d8137
add config to migration
mdekstrand Jan 10, 2025
14f4754
Merge branch 'main' into feature/pydantic-config
mdekstrand Jan 10, 2025
a332303
add config test
mdekstrand Jan 11, 2025
9384c36
upgrade FunkSVD
mdekstrand Jan 11, 2025
462c463
upgrade SVD to config
mdekstrand Jan 11, 2025
5567998
handle component base classes
mdekstrand Jan 11, 2025
20ce979
update implicit to configurable component
mdekstrand Jan 11, 2025
5874660
rerun getting-started guide
mdekstrand Jan 11, 2025
68e6516
use user/item reg things
mdekstrand Jan 11, 2025
d10686b
remove UIPair
mdekstrand Jan 11, 2025
1945978
remove unused normalize function
mdekstrand Jan 11, 2025
c6186d9
component nocover
mdekstrand Jan 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -611,9 +611,12 @@ jobs:
- name: Download ML data
run: |
python -m lenskit.data.fetch ml-100k ml-1m ml-10m ml-20m
- name: "📕 Validate documentation examples"
- name: "📕 Validate code examples"
run: |
pytest --cov=lenskit/lenskit --cov=lenskit-funksvd/lenskit --cov=lenskit-implicit/lenskit --cov=lenskit-hpf/lenskit --nbval-lax --doctest-glob='*.rst' --ignore='docs/_ext' --log-file test-docs.log docs */lenskit
sphinx-build -b doctest docs build/doc
- name: "📕 Validate example notebooks"
run: |
pytest --cov=lenskit/lenskit --cov=lenskit-funksvd/lenskit --cov=lenskit-implicit/lenskit --cov=lenskit-hpf/lenskit --nbval-lax --log-file test-notebooks.log docs
- name: "📐 Coverage results"
if: '${{ !cancelled() }}'
run: |
Expand Down
3 changes: 3 additions & 0 deletions .vscode/ltex.dictionary.en-US.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,6 @@ RecSys
PyArrow
Numba
DuckDB
ItemList
Pydantic
dataclass
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@
from pytest import fixture, skip

from lenskit.parallel import ensure_parallel_init
from lenskit.random import set_global_rng

# bring common fixtures into scope
from lenskit.testing import ml_100k, ml_ds, ml_ds_unchecked, ml_ratings # noqa: F401
from lenskit.util.random import set_global_rng

logging.getLogger("numba").setLevel(logging.INFO)

Expand Down
2 changes: 1 addition & 1 deletion docs/api/data-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ Entity Identifiers
Containers
~~~~~~~~~~

.. autoclass:: UITuple
.. autoclass:: UIPair
2 changes: 1 addition & 1 deletion docs/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ Core Abstractions
lenskit.pipeline
lenskit.diagnostics
lenskit.operations
lenskit.types

.. toctree::
:caption: Core
Expand Down Expand Up @@ -81,3 +80,4 @@ and may be useful in building new models and components for LensKit.
lenskit.parallel
lenskit.testing
lenskit.util
lenskit.random
1 change: 0 additions & 1 deletion docs/api/pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ LensKit components.

~lenskit.pipeline.Component
~lenskit.pipeline.Trainable
~lenskit.pipeline.Configurable

Standard Pipelines
------------------
Expand Down
13 changes: 10 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
# Licensed under the MIT license, see LICENSE.md for details.
# SPDX-License-Identifier: MIT

import doctest
import sys
from importlib.metadata import version
from os import fspath
from pathlib import Path

from packaging.version import Version
Expand All @@ -25,6 +27,7 @@
"sphinx.ext.napoleon",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.doctest",
"sphinx.ext.intersphinx",
"sphinx.ext.mathjax",
"sphinx.ext.extlinks",
Expand Down Expand Up @@ -102,9 +105,9 @@
autodoc_typehints = "description"
autodoc_type_aliases = {
"ArrayLike": "numpy.typing.ArrayLike",
"SeedLike": "lenskit.types.SeedLike",
"RNGLike": "lenskit.types.RNGLike",
"RNGInput": "lenskit.types.RNGInput",
"SeedLike": "lenskit.random.SeedLike",
"RNGLike": "lenskit.random.RNGLike",
"RNGInput": "lenskit.random.RNGInput",
"IDSequence": "lenskit.data.types.IDSequence",
}
# autosummary_generate_overwrite = False
Expand Down Expand Up @@ -133,6 +136,10 @@

bibtex_bibfiles = ["lenskit.bib"]
nb_execution_mode = "off"
doctest_path = [fspath((Path(__file__).parent / "guide" / "examples").resolve())]
doctest_default_flags = (
doctest.ELLIPSIS | doctest.IGNORE_EXCEPTION_DETAIL | doctest.NORMALIZE_WHITESPACE
)

mermaid_d3_zoom = True

Expand Down
28 changes: 14 additions & 14 deletions docs/guide/GettingStarted.ipynb

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions docs/guide/batch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ For an example, let's start with importing things to run a quick batch:
Load and split some data:

>>> data = load_movielens('data/ml-100k.zip')
>>> split = sample_users(data, 150, SampleN(5))
>>> split = sample_users(data, 150, SampleN(5, rng=1024), rng=42)

Configure and train the model:

Expand All @@ -62,9 +62,9 @@ And measure their results:
>>> measure.add_metric(RBP())
>>> scores = measure.compute(recs, split.test)
>>> scores.list_summary() # doctest: +ELLIPSIS
mean median std
mean median std
metric
RBP 0.07... 0.0... 0.1...
RBP 0.09... 0.0... 0.1...


The :py:func:`predict` function works similarly, but for rating predictions;
Expand Down
6 changes: 3 additions & 3 deletions docs/guide/conventions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ splitting support <./splitting>`_.

Now that `SPEC 7`_ has standardized RNG seeding across the scientific Python
ecosystem, we use that with some lightweight helpers in the
:mod:`lenskit.util.random` module instead of using SeedBank.
:mod:`lenskit.random` module instead of using SeedBank.

LensKit extends SPEC 7 with a global RNG that components can use as a fallback,
to make it easier to configure system-wide generation for things like tests.
This is configured with :func:`~lenskit.util.random.set_global_rng`.
This is configured with :func:`~lenskit.random.set_global_rng`.

When implementing a component that uses randomness in its training, we recommend
deferring conversion of the provided RNG into an actual generator until
Expand All @@ -56,7 +56,7 @@ When using the RNG to create initial state for e.g. training a model with
PyTorch, it can be useful to create that state in NumPy and then convert to a
tensor, so that components are consistent in their random number generation
behavior instead of having variation between NumPy and other backends.
Components can use the :func:`~lenskit.util.random_generator` function to
Components can use the :func:`~lenskit.random_generator` function to
convert seed material or a generator into a NumPy generator, falling back to the
global RNG if one is specified.

Expand Down
43 changes: 43 additions & 0 deletions docs/guide/examples/blendcomp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
from pydantic import BaseModel

from lenskit.data import ItemList
from lenskit.pipeline import Component


class LinearBlendConfig(BaseModel):
"Configuration for :class:`LinearBlendScorer`."

# define the parameter with a type, default value, and docstring.
mix_weight: float = 0.5
r"""
Linear blending mixture weight :math:`\alpha`.
"""


class LinearBlendScorer(Component):
r"""
Score items with a linear blend of two other scores.

Given a mixture weight :math:`\alpha` and two scores
:math:`s_i^{\mathrm{left}}` and :math:`s_i^{\mathrm{right}}`, this
computes :math:`s_i = \alpha s_i^{\mathrm{left}} + (1 - \alpha)
s_i^{\mathrm{right}}`. Missing values propagate, so only items
scored in both inputs have scores in the output.
"""

# define the configuration attribute, with a docstring to make sure
# it shows up in component docs.
config: LinearBlendConfig
"Configuration parameters for the linear blend."

# the __call__ method defines the component's operation
def __call__(self, left: ItemList, right: ItemList) -> ItemList:
"""
Blend the scores of two item lists.
"""
ls = left.scores("pandas", index="ids")
rs = right.scores("pandas", index="ids")
ls, rs = ls.align(rs)
alpha = self.config.mix_weight
combined = ls * alpha + rs * (1 - alpha)
return ItemList(item_ids=combined.index, scores=combined.values)
15 changes: 14 additions & 1 deletion docs/guide/migrating.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ New code should use :py:func:`lenskit.data.from_interactions_df` to convert a Pa
data frame into a :py:func:`~lenskit.data.Dataset`, or one of the standard loaders
such as :py:func:`lenskit.data.load_movielens`.

While most LensKit data frame code still recognizes the legacy ``user`` and
``item`` columns from LensKit 0.14 and earlier, data frames of LensKit data
should use the column names ``user_id`` and ``item_id`` instead, to
unambiguously distinguish them from user and item numbers.

Additional dataset construction support and possible implementations (e.g.
database-backed datasets) are coming, but this is the migration path for the
typical code patterns used in LensKit 0.14 and earlier.
Expand Down Expand Up @@ -180,10 +185,18 @@ them for very different ways of turning scoring models into full recommenders.
.. note::

Since 2025, we no longer use the term “algorithm” in LensKit, as it is
ambiguous and promotes confusion about very different things. Instead we
ambiguous and promotes confusion about very different things. Instead, we
have “pipelines” consisting of ”components”, some of which may be ”models”
(for scoring, ranking, etc.).

Configuration Components
........................

Individual components now use Pydantic_ models to represent their configuration
(e.g. hyperparameters). This is to reduce redundancy, improve documentation,
enable consistent serialization, and validate parameter values in a consistent
and automated fashion. See :ref:`component-config` for details.

Obtaining Recommendations
-------------------------

Expand Down
Loading
Loading