Skip to content

Commit

Permalink
Merge pull request #644 from mdekstrand/tweak/convention-docs
Browse files Browse the repository at this point in the history
Improve pipeline modification and documentation
  • Loading branch information
mdekstrand authored Feb 22, 2025
2 parents 6294f48 + edec6b7 commit 7cb56f1
Show file tree
Hide file tree
Showing 9 changed files with 318 additions and 110 deletions.
28 changes: 28 additions & 0 deletions docs/guide/conventions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ The components shipped with LensKit follow certain conventions to make their
configuration and operation consistent and predictable. We encourage you to
follow these conventions in your own code as well.

.. _list-length:

List Length
~~~~~~~~~~~

Expand All @@ -17,6 +19,32 @@ allows list length to be baked into a pipeline configuration, and also allows
that length to be specified or overridden at runtime. If both lengths are
specified, the runtime length takes precedence.

See :class:`lenskit.basic.TopNRanker` or :class:`lenskit.basic.SoftmaxRanker`
for examples.


.. _config-conventions:

Configuration Conventions
-------------------------

We strive for consistency in configuration across LensKit components. To that end,
there are a few common configuration option or hyperparameter names we use, and
encourage you to use these in your own components unless you have a compelling reason
not to.

``embedding_size``
The dimensionality of embeddings or a latent feature space (e.g., the dimension
in matrix factorization or dimensionality reduction).
``epochs``
The number of training epochs for an iterative method (this option name is
required by :ref:`iterative-training`).
``learning_rate``
The learning rate for an iterative method.
``reg``
The regularization weight for regularized models.


.. _rng:

Random Seeds
Expand Down
160 changes: 160 additions & 0 deletions docs/guide/implementing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
.. _component-impl:

Implementing Components
=======================

LensKit is particularly designed to excel in research and educational
applications, and for that you will often need to write your own components
implementing new scoring models, rankers, or other components. The
:ref:`pipeline design <pipeline>` and :ref:`standard pipelines
<standard-pipelines>` are intended to make this as easy as possible and allow
you to focus just on your logic without needing to implement a lot of
boilerplate like looking up user histories or ranking by score: you can
implement your training and scoring logic, and let LensKit do the rest.

Basics
~~~~~~

Implementing a component therefore consists of a few steps:

1. Defining the configuration class.
2. Defining the component class, with its ``config`` attribute declaration.
3. Defining a ``__call__`` method for the component class that performs the
component's actual computation.
4. If the component supports training, implementing the
:class:`~lenskit.training.Trainable` protocol by defining a
:meth:`~lenskit.training.Trainable.train` method, or implement
:ref:`iterative-training`.

A simple example component that computes a linear weighted blend of the scores
from two other components could look like this:

.. literalinclude:: examples/blendcomp.py

This component can be instantiated with its defaults:

.. testsetup::

from blendcomp import LinearBlendScorer, LinearBlendConfig


.. doctest::

>>> LinearBlendScorer()
<LinearBlendScorer {
"mix_weight": 0.5
}>

You an instantiate it with its configuration class:

.. doctest::

>>> LinearBlendScorer(LinearBlendConfig(mix_weight=0.2))
<LinearBlendScorer {
"mix_weight": 0.2
}>

Finally, you can directly pass configuration parameters to the component constructor:

.. doctest::

>>> LinearBlendScorer(mix_weight=0.7)
<LinearBlendScorer {
"mix_weight": 0.7
}>


Component Configuration
~~~~~~~~~~~~~~~~~~~~~~~

As noted in the :ref:`pipeline documentation <component-config>`, components are
configured with *configuration objects*. These are JSON-serializable objects
defined as Python dataclasses or Pydantic models, and define the different
settings or hyperparameters that control the model's behavior.

The choice of parameters are up to the component author, and each component will
have different configuration needs. Some needs are common across many
components, though; see :ref:`config-conventions` for common LensKit
configuration conventions.

Component Operation
~~~~~~~~~~~~~~~~~~~

The heart of the component interface is the ``__call__`` method (components are
just callable objects). This method takes the component inputs as parameters,
and returns the component's result.

Most components return an :class:`~lenskit.data.ItemList`. Scoring components usually
have the following signature:

.. code:: python
def __call__(self, query: QueryInput, items: ItemList) -> ItemList:
...
The ``query`` input receives the user ID, history, context, or other query
input; the ``items`` input receives the list of items to be scored (e.g., the
candidate items for recommendation). The scorer then returns a list of scored
items.

Most component begin by converting the query to a
:class:`~lenskit.data.RecQuery`::

def __call__(self, query: QueryInput, items: ItemList) -> ItemList:
query = RecQuery.create(query)
...

It is conventional for scorers to return a copy of the input item list with the scores
attached, filling in ``NaN`` for items that cannot be scored. After assembling a NumPy
array of scores, you can do this with::

return ItemList(items, scores=scores)

Scalars can also be supplied, so if the scorer cannot score any of the items, it
can simply return a list with no scores::

return ItemList(items, scores=np.nan)

Components do need to be able to handle items in ``items`` that were not seen
at training time. If the component has saved the training item vocabulary, the
easiest way to do this is to use :meth:`~lenskit.data.ItemList.numbers`: with
``missing="negative"``::

i_nums = items.numbers(vocabulary=self.items, missing="negative")
scorable_mask = i_nums >= 0

Component Training
~~~~~~~~~~~~~~~~~~

Components that need to train models on training data must implement the
:class:`~lenskit.training.Trainable` protocol, either directly or through a
helper implementation like :class:`~lenskit.training.IterativeTraining`. The
core of the ``Trainable`` protocol is the
:meth:`~lenskit.training.Trainable.train` method, which takes a
:class:`~lenskit.data.Dataset` and :class:`~lenskit.training.TrainingOptions`
and trains the model.

The details of training will vary significantly from model to model. Typically,
though, it follows the following steps:

1. Extract, prepare, and preprocess training data as needed for the model.
2. Compute the model's parameters, either directly (i.e.
:class:`~lenskit.basic.BiasScorer`) or through an optimization method (i.e.
:class:`~lenskit.basic.ImplicitMFScorer`).
3. Finalize the model parameters and clean up any temporary data.

Learned model parameters are then stored as attributes on the component class,
either directly or in a container object (such as a PyTorch
:class:`~torch.nn.Module`).

.. note::

If the model is already trained and the
:attr:`~lenskit.training.TrainingOptions.retrain` is ``False``, then the
``train`` method should return without any training.
:class:`~lenskit.training.IterativeTraining` handles this automatically.

Further Reading
~~~~~~~~~~~~~~~

See :ref:`conventions` for more conventions for component design and configuration.
1 change: 1 addition & 0 deletions docs/guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ guide to how to use LensKit for research, education, and other purposes.
scorers
rankers
other-components
implementing

.. toctree::
:caption: Experiments
Expand Down
93 changes: 29 additions & 64 deletions docs/guide/pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ like user history and candidate set lookup.
as well as by Haystack_.

.. _Haystack: https://docs.haystack.deepset.ai/docs/pipelines
.. _POPROX: https://ccri-poprox.github.io/poprox-researcher-manual/reference/recommender/poprox_recommender.pipeline.html

.. _pipeline-construct:

Expand Down Expand Up @@ -138,7 +137,7 @@ These input connections are specified via keyword arguments to the
should be wired.


You can also use :meth:`PipelineBuilder.default_conection` to specify default
You can also use :meth:`PipelineBuilder.default_connection` to specify default
connections. For example, you can specify a default for inputs named ``user``::

pipe.default_connection('user', user_history)
Expand Down Expand Up @@ -192,7 +191,7 @@ The :meth:`~Pipeline.run` method takes two types of inputs:
altered scores).

* Keyword arguments specifying the values for the pipeline's inputs, as defined by
calls to :meth:`Pipeline.create_input`.
calls to :meth:`PipelineBuilder.create_input`.

Pipeline execution logically proceeds in the following steps:

Expand Down Expand Up @@ -222,7 +221,7 @@ itself, e.g.:
* ``item-embedder``

Component nodes can also have *aliases*, allowing them to be accessed by more
than one name. Use :meth:`Pipeline.alias` to define these aliases.
than one name. Use :meth:`PipelineBuilder.alias` to define these aliases.

Various LensKit facilities recognize several standard component names used by
the standard pipeline builders, and we recommend you use them in your own
Expand Down Expand Up @@ -255,7 +254,7 @@ Pipelines are defined by the following:
* The components and inputs (nodes)
* The component input connections (edges)
* The component configurations (see :class:`Component`)
* The components' learned parameters (see :class:`Trainable`)
* The components' learned parameters (see :class:`~lenskit.training.Trainable`)

LensKit supports serializing both pipeline descriptions (components,
connections, and configurations) and pipeline parameters. There are
Expand All @@ -265,10 +264,10 @@ two ways to save a pipeline or part thereof:
pipeline; it has the usual downsides of pickling (arbitrary code execution,
etc.). LensKit uses pickling to share pipelines with worker processes for
parallel batch operations.
2. Save the pipeline configuration with :meth:`Pipeline.get_config`. This saves
the components, their configurations, and their connections, but **not** any
learned parameter data. A new pipeline can be constructed from such a
configuration can be reloaded with :meth:`Pipeline.from_config`.
2. Save the pipeline configuration (:attr:`Pipeline.config`, using :func:`~pydantic.BaseModel.model_dump_json`). This saves the components,
their configurations, and their connections, but **not** any learned
parameter data. A new pipeline can be constructed from such a configuration
can be reloaded with :meth:`Pipeline.from_config`.

..
3. Save the pipeline parameters with :meth:`Pipeline.save_params`. This saves
Expand Down Expand Up @@ -307,8 +306,8 @@ two ways to save a pipeline or part thereof:

.. _standard-pipelines:

Standard Layouts
~~~~~~~~~~~~~~~~
Standard Pipelines
~~~~~~~~~~~~~~~~~~

The standard recommendation pipeline, produced by either of the approaches
described above in :ref:`pipeline-construct`, looks like this:
Expand Down Expand Up @@ -370,6 +369,9 @@ to be trained.
Components also must be pickleable, as LensKit uses pickling for shared memory
parallelism in its batch-inference code.

See :ref:`component-impl` for more information on implementing your own
components.

.. _component-config:

Configuring Components
Expand All @@ -389,6 +391,8 @@ configuration object if one is provided, or instantiating the configuration
class with defaults or from keyword arguments. In most cases, you don't need
to define a constructor for a component.

See :ref:`config-conventions` for standard configuration option names.

.. admonition:: Motivation
:class: note

Expand All @@ -411,59 +415,6 @@ to define a constructor for a component.
- The base class can provide well-defined and complete string
representations for free to all component implementations.

.. _component-impl:

Implementing Components
-----------------------

Implementing a component therefore consists of a few steps:

1. Defining the configuration class.
2. Defining the component class, with its `config` attribute declaration.
3. Defining a `__call__` method for the component class that performs the
component's actual computation.
4. If the component supports training, implementing the :class:`Trainable`
protocol by defining a :meth:`Trainable.train` method.

A simple example component that computes a linear weighted blend of the scores
from two other components could look like this:

.. literalinclude:: examples/blendcomp.py

This component can be instantiated with its defaults:

.. testsetup::

from blendcomp import LinearBlendScorer, LinearBlendConfig


.. doctest::

>>> LinearBlendScorer()
<LinearBlendScorer {
"mix_weight": 0.5
}>

You an instantiate it with its configuration class:

.. doctest::

>>> LinearBlendScorer(LinearBlendConfig(mix_weight=0.2))
<LinearBlendScorer {
"mix_weight": 0.2
}>

Finally, you can directly pass configuration parameters to the component constructor:

.. doctest::

>>> LinearBlendScorer(mix_weight=0.7)
<LinearBlendScorer {
"mix_weight": 0.7
}>

See :ref:`conventions` for more conventions for component design.

Adding Components to the Pipeline
---------------------------------

Expand All @@ -490,6 +441,20 @@ You can add components to the pipeline in two ways:
When you use the second approach, :meth:`PipelineBuilder.build` instantiates the
component from the provided configuration.

Modifying Pipelines
~~~~~~~~~~~~~~~~~~~

Pipelines, once constructed, are immutable (and modifying the pipeline, its
configuration, or its internal data structures is undefined behavior). However,
you can create a new pipeline from an existing one with added or changed
components. To do this:

1. Create a builder from the pipeline with :meth:`Pipeline.modify`, which
returns a :class:`PipelineBuilder`.
2. Add new components, or replace existing ones with
:meth:`PipelineBuilder.replace_component`.
3. Build the modified pipeline with :meth:`PipelineBuilder.build`.

POPROX and Other Integrators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Loading

0 comments on commit 7cb56f1

Please sign in to comment.