-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype using new LensKit pipeline abstraction #81
Conversation
src/poprox_recommender/default.py
Outdated
candidates = pipeline.create_input("candidate", ArticleSet) | ||
clicked = pipeline.create_input("clicked", ArticleSet) | ||
profile = pipeline.create_input("profile", InterestProfile) | ||
e_cand = pipeline.add_component("embed-candidates", article_embedder, article_set=candidates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verb style naming here feels a little weird to me, because a component is conceptually a thing rather than an action. If you really want the verb style, I don't think add_component
is the right method name to go with that (but I do like add_component
because I think it aligns more naturally with the mental model that e.g. a diversifier is a component of a larger pipeline/system.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair! I was trying to do something consistent (and make a consistent recommendation in the docs), and wasn't 100% on this but favored “pick one, be consistent, and document the rec”.
We can use nouns / noun phrases, and it would be slightly longer in some cases but I think would read reasonably well:
e_cand = pipeline.add_component("candidate-embedder", ...)
Or if we want to name it after the thing returned:
e_cand = pipeline.add_component("candidate-embeddings", ...)
Working through that example, we have 3 possible naming conventions:
- The name of the component itself (embedder).
- The name of the component's return data (embeddings).
- The action performed by the component (embed).
Based on your rationale, it sounds like you would lean towards (1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed an update with the component name naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to have something like add_process
or add_stage
and use the verb-based naming? Not sure it would, just pondering while I update the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the noun-style naming better because I find it easier to think about composing objects than actions, but if we went the verb-style route I was considering add_step
Cross-added the pipeline updates to LensKit in lenskit/lkpy#466. |
Now merged and tested w/ @karlhigley's new pipeline components change. |
@@ -82,23 +85,31 @@ def personalized_pipeline(num_slots: int, algo_params: dict[str, Any] | None = N | |||
logger.info("Recommendations will be ranked with plain top-k.") | |||
ranker = topk_ranker | |||
|
|||
pipeline = RecommendationPipeline(name=diversify) | |||
# TODO put pipeline name back in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth resolving this now or want to wait for a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#85 adds it :)
@@ -25,7 +25,7 @@ def test_direct_basic_request(): | |||
req.num_recs, | |||
) | |||
# do we get recommendations? | |||
assert len(outputs.recs) > 0 | |||
assert len(outputs.default.articles) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker: Wonder if this might be worth applying some syntactic sugar to avoid having to sprinkle default
in so many places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly.
This updates the POPROX pipeline to use a vendored copy of the LensKit pipeline abstraction. The LensKit pipeline is heavily inspired by the POPROX one. My design process was basically “how do I take the core idea from the POPROX pipeline — which I really like — and add the capabilities I need for LensKit + build it on a DAG instead of linear and iterative state updates?”. The pipeline docs are rendered here: https://lkpy.lenskit.org/en/latest/pipeline.html
This is a prototype of how the new pipeline I built for LensKit (lenskit/lkpy#462) would fit into the POPROX recommenders. It directly vendors a copy of the LensKit code, with dataset and trainability removed, since the LensKit version is not yet in a release.
The LensKit pipeline is heavily inspired by the POPROX one. My design process was basically “how do I take the core idea from the POPROX pipeline — which I really like — and add the capabilities I need for LensKit + build it on a DAG instead of linear and iterative state updates?”.
The pipeline docs are rendered here: https://lkpy.lenskit.org/en/latest/pipeline.html