Skip to content

[QST] can we first filter then join? #5389

Answered by revans2
JustPlay asked this question in General
Discussion options

You must be logged in to vote

Spark already does a filter push down as a part of join optimization. If a conditional only involves one side of a join then it is pushed down before the join. In other cases it cannot be pushed down so the filter happens after the join. In fact filtering after the join only works for Inner joins because for all other joins you may need to evaluate the conditional for all combinations of matching keys to see if the condition will allow or exclude the row from being created.

We are working with cudf to be able to push the conditional into the join so we can reduce the amount of data that is materialized and increase conditional joins to include non-inner joins. But that is not likely to sh…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #735 on April 28, 2022 23:27.