You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of Joiner and MultiAggJoiner has some limitations which are in part caused by the fact they need to follow the scikit-learn estimator template:
It implements only the left join. This makes sense as an estimator because the number of samples must remain constant. However, a user may expect to be able to perform any other kind of join (inner, outer, anti...) since that is the behavior of pandas or polars merge operators.
It is hard to put in production because the join tables are defined in the init and may change between the init and when the join is executed.
( @Vincent-Maladiere )
In general, the fit/transform structure makes it clunky to use if the user only needs it to perform multiple joins and does not care about putting it into a pipeline.
I think it would be useful to have a more lightweight "join operator" that implements the join without the constraints of the estimator.
Feature Description
Rather than the current implementation, a join_tables operator would look similar to this:
I am calling this an "operator" because it will operate directly on the given tables, and is stateless.
It should be possible to reuse most of the machinery that has already been implemented in the Joiners, so it should not be too complicated to implement.
Alternative Solutions
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Problem Description
The current implementation of
Joiner
andMultiAggJoiner
has some limitations which are in part caused by the fact they need to follow the scikit-learn estimator template:( @Vincent-Maladiere )
I think it would be useful to have a more lightweight "join operator" that implements the join without the constraints of the estimator.
Feature Description
Rather than the current implementation, a
join_tables
operator would look similar to this:I am calling this an "operator" because it will operate directly on the given tables, and is stateless.
It should be possible to reuse most of the machinery that has already been implemented in the Joiners, so it should not be too complicated to implement.
Alternative Solutions
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: