-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-51752][SQL] Enable rCTE referencing from within a CTE #50546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51752][SQL] Enable rCTE referencing from within a CTE #50546
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Show resolved
Hide resolved
@@ -83,8 +83,21 @@ object ResolveWithCTE extends Rule[LogicalPlan] { | |||
cteDef.copy(child = alias.copy(child = loop)) | |||
} | |||
|
|||
// Simple case of duplicating (UNION ALL) clause, with CTEs inside. | |||
case alias @ SubqueryAlias(_, withCTE @ WithCTE( | |||
Union(Seq(anchor, recursion), false, false), _)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't care about the query pattern in CTE relations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we do, yes!
Since these CTEs refer to the whole inside of the initial CTE an can be called from the recursion we need to add logic to replace CteRelationRef with UnionLoopRef inside of these CTEs (and not just inside of the recursion). I've changed this now and added some tests to test it.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Outdated
Show resolved
Hide resolved
…ysis/ResolveWithCTE.scala Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
…ysis/CTESubstitution.scala Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
@@ -255,6 +273,21 @@ object CTESubstitution extends Rule[LogicalPlan] { | |||
outerCTEDefs | |||
} | |||
for ((name, relation) <- relations) { | |||
// If recursion is allowed (RECURSIVE keyword specified) | |||
// then it has higher priority than outer or previous relations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about semantics here. Let's see an example
WITH RECURSIVE t AS ...
WITH t AS ...
SELECT ... FROM t
The t
references to the non-recursive CTE def t
or the recursive one? It looks more reasonable to reference the non-recursive one as it's closer in the scope. It's weird to make recursive CTE def to have higher priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It references the inner non recursive one (just checked). The comment just says that the recursive reference has a higher priority than the outer ones or the ones within in the same WITH statement.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
Outdated
Show resolved
Hide resolved
…ysis/CTESubstitution.scala Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
thanks, merging to master! |
What changes were proposed in this pull request?
Enabling the possibility of a CTE referencing the recursive CTE it is inside of. This is done by modifying the CTESubstitution file, consisting of two main parts:
Why are the changes needed?
To make queries that self reference work. An example of such a query is:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing golden file tests for this that didn't work before.
Was this patch authored or co-authored using generative AI tooling?
No.