-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ROLLUP, CUBE grouping expansion #783
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -353,6 +353,30 @@ message AggregateRel { | |
// A list of zero or more references to grouping expressions, i.e., indices | ||
// into the `grouping_expression` list. | ||
repeated uint32 expression_references = 2; | ||
|
||
// An optional list of zero or more grouping id of `expression_references` | ||
// at corresponding index. The expressions sharing the same non-zero values | ||
// in this field formulates a composite grouping expression (i.e., those | ||
// columns considered as one grouping expression). | ||
// For GROUNPING_EXPANSION_UNSPECIFIED, this field is ignored. | ||
// | ||
// Example: | ||
// { | ||
// expression_references: [ 3, 1, 4, 5, 2 ], | ||
// composite_grouping_expressions: [ 1, 1, 0, 2, 2 ], | ||
// expansion: GROUPING_EXPANSION_CUBE | ||
// }, | ||
// | ||
// means CUBE((3, 1), 4, (5, 2)), thus expands to following grouping sets: | ||
// | ||
// (), | ||
// (3, 1), (4), (5, 2), | ||
// (3, 1, 4), (3, 1, 5, 2), (4, 5, 2), | ||
// (3, 1, 4, 5, 2) | ||
repeated uint32 composite_grouping_expressions = 3; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it might be better to create an alternative representation rather than the double array representation. This feels a bit complicated. As part of that, it might be best if it also includes the indiciator value explicitly rather than the more complex pattern you describe below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not that this is proposal, not a demand. What are you thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was never happy with this representation but just put out here to get some feedback. :-) The reason why I chose to this was because of the abundant usage of index/offset in other places. If this feels complex to you, we should make it more explicit, perhaps, use I don't know whether we will like it but I will draft that. Let's see how we like it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jacques-n BTW, could you elaborate |
||
|
||
// Optional grouping expansion method. Default: no expansion. | ||
GroupingExpansion expansion = 4; | ||
} | ||
|
||
message Measure { | ||
|
@@ -364,6 +388,17 @@ message AggregateRel { | |
// Helps to support SUM(<c>) FILTER(WHERE...) syntax without masking opportunities for optimization | ||
Expression filter = 2; | ||
} | ||
|
||
enum GroupingExpansion { | ||
// No grouping expansion. Default. | ||
GROUPING_EXPANSION_UNSPECIFIED = 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think in other locations NONE is specified separately from UNSPECIFICED. Can you confirm? If so, we should match that pattern here. |
||
|
||
// N grouping expression groups are expanded into N+1 groupings. | ||
GROUPING_EXPANSION_ROLLUP = 1; | ||
|
||
// N grouping expression groups are expanded into 2^N grounpings. | ||
GROUPING_EXPANSION_CUBE = 2; | ||
} | ||
} | ||
|
||
// ConsistentPartitionWindowRel provides the ability to perform calculations across sets of rows | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -381,10 +381,14 @@ Grouping sets can be used for finer-grained control over which records are folde | |
|
||
It is possible to specify multiple grouping sets in a single aggregate operation. The grouping sets behave more or less independently, with each returned record belonging to one of the grouping sets. The values for the grouping expression columns that are not part of the grouping set for a particular record will be set to null. The columns for grouping expressions that do *not* appear in *all* grouping sets will be nullable (regardless of the nullability of the type returned by the grouping expression) to accomodate the null insertion. | ||
|
||
To further disambiguate which record belongs to which grouping set, an aggregate relation with more than one grouping set receives an extra `i32` column on the right-hand side. The value of this field will be the zero-based index of the grouping set that yielded the record. | ||
To further disambiguate which record belongs to which grouping set, an aggregate relation with more than one grouping set receives an extra `i32` indicator column on the right-hand side. The value of this field will be the zero-based index of the grouping set that yielded the record. | ||
|
||
If at least one grouping expression is present, the aggregation is allowed to not have any aggregate expressions. An aggregate relation is invalid if it would yield zero columns. | ||
|
||
Optionally, a grouping set can be specified with rollup or cube expansions rather than fully spelling out the grouping sets. When either rollup or cube is specified for a grouping set, permutations of grouping expressions will be generated accordingly. Given `n` grouping expressions for a grouping set, ROLLUP expands to n + 1 grouping sets. CUBE expands to 2^`n` grouping sets (i.e., all subsets of the `n` grouping expressions). Given `i`th grouping set with ROLLUP expansion, the indicator values are assigned from `i` (i.e., an empty grouping) to `i + n + 1` (i.e., the grouping as if no expansion was done). For CUBE, the indicator values are assigned from `i` (i.e., an empty grouping) to `i + 2^n` (i.e., the grouping as if no expanson was done) where the values are assigned as `n`-bit integer where the bit is set when a grouping expression at corresponding position is being grouped. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add some concrete examples in a table or similar here? |
||
|
||
With ROLLUP and CUBE, the grouping expressions can form a group and treated as a unit of expansion. To represent such grouping of expressions within grouping set, an optional list of grouping information can be specified. For example, given grouping expressions `[3, 1, 2, 4, 5]` with the list `[1, 1, 0, 1, 1]` represents that there are actually 3 grouping expressions: `(3, 1), 2, (4, 5)`. That is, grouping expressions `3, 1` together formulate one composite grouping expression, and so does `4, 5`. `2`, specified with value `0` do not forumate such composite grouping. | ||
|
||
### Aggregate Properties | ||
|
||
| Property | Description | Required | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Typo: GROUNPING_EXPANSION_UNSPECIFIED