Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype using new LensKit pipeline abstraction #81

Merged
merged 13 commits into from
Aug 9, 2024

Conversation

mdekstrand
Copy link
Contributor

@mdekstrand mdekstrand commented Aug 2, 2024

This is a prototype of how the new pipeline I built for LensKit (lenskit/lkpy#462) would fit into the POPROX recommenders. It directly vendors a copy of the LensKit code, with dataset and trainability removed, since the LensKit version is not yet in a release.

The LensKit pipeline is heavily inspired by the POPROX one. My design process was basically “how do I take the core idea from the POPROX pipeline — which I really like — and add the capabilities I need for LensKit + build it on a DAG instead of linear and iterative state updates?”.

The pipeline docs are rendered here: https://lkpy.lenskit.org/en/latest/pipeline.html

@mdekstrand mdekstrand marked this pull request as ready for review August 5, 2024 18:39
@mdekstrand mdekstrand requested a review from karlhigley August 5, 2024 18:39
src/poprox_recommender/default.py Outdated Show resolved Hide resolved
src/poprox_recommender/default.py Outdated Show resolved Hide resolved
candidates = pipeline.create_input("candidate", ArticleSet)
clicked = pipeline.create_input("clicked", ArticleSet)
profile = pipeline.create_input("profile", InterestProfile)
e_cand = pipeline.add_component("embed-candidates", article_embedder, article_set=candidates)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verb style naming here feels a little weird to me, because a component is conceptually a thing rather than an action. If you really want the verb style, I don't think add_component is the right method name to go with that (but I do like add_component because I think it aligns more naturally with the mental model that e.g. a diversifier is a component of a larger pipeline/system.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair! I was trying to do something consistent (and make a consistent recommendation in the docs), and wasn't 100% on this but favored “pick one, be consistent, and document the rec”.

We can use nouns / noun phrases, and it would be slightly longer in some cases but I think would read reasonably well:

e_cand = pipeline.add_component("candidate-embedder", ...)

Or if we want to name it after the thing returned:

e_cand = pipeline.add_component("candidate-embeddings", ...)

Working through that example, we have 3 possible naming conventions:

  1. The name of the component itself (embedder).
  2. The name of the component's return data (embeddings).
  3. The action performed by the component (embed).

Based on your rationale, it sounds like you would lean towards (1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an update with the component name naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to have something like add_process or add_stage and use the verb-based naming? Not sure it would, just pondering while I update the docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the noun-style naming better because I find it easier to think about composing objects than actions, but if we went the verb-style route I was considering add_step

@mdekstrand mdekstrand requested a review from karlhigley August 6, 2024 17:30
@mdekstrand
Copy link
Contributor Author

Cross-added the pipeline updates to LensKit in lenskit/lkpy#466.

@mdekstrand
Copy link
Contributor Author

Now merged and tested w/ @karlhigley's new pipeline components change.

@@ -82,23 +85,31 @@ def personalized_pipeline(num_slots: int, algo_params: dict[str, Any] | None = N
logger.info("Recommendations will be ranked with plain top-k.")
ranker = topk_ranker

pipeline = RecommendationPipeline(name=diversify)
# TODO put pipeline name back in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth resolving this now or want to wait for a follow-up PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#85 adds it :)

@@ -25,7 +25,7 @@ def test_direct_basic_request():
req.num_recs,
)
# do we get recommendations?
assert len(outputs.recs) > 0
assert len(outputs.default.articles) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker: Wonder if this might be worth applying some syntactic sugar to avoid having to sprinkle default in so many places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly.

@mdekstrand mdekstrand merged commit ac5911f into main Aug 9, 2024
4 checks passed
@mdekstrand mdekstrand deleted the mdekstrand/lenskit-pipeline branch August 9, 2024 18:14
sophiasun0515 pushed a commit that referenced this pull request Aug 12, 2024
This updates the POPROX pipeline to use a vendored copy of the LensKit pipeline abstraction.

The LensKit pipeline is heavily inspired by the POPROX one. My design
process was basically “how do I take the core idea from the POPROX
pipeline — which I really like — and add the capabilities I need for
LensKit + build it on a DAG instead of linear and iterative state
updates?”.

The pipeline docs are rendered here:
https://lkpy.lenskit.org/en/latest/pipeline.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants