Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ROLLUP, CUBE grouping expansion #783

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yongchul
Copy link

@yongchul yongchul commented Feb 1, 2025

feat: add ROLLUP, CUBE grouping expansion

Adds ROLLUP and CUBE grouping expansions to AggregateRel. See #782 for rationale.

  • Adds AggregateRel.GroupingExpansion enum to denote expansion type.
  • Adds expansion field to AggregateRel.Grouping to specify expansion type in a grouping set.
  • Adds composite_grounping_expressions field to AggregateRel.Grouping to support composite grouping expression.

Refs: #782

@CLAassistant
Copy link

CLAassistant commented Feb 1, 2025

CLA assistant check
All committers have signed the CLA.

// at corresponding index. The expressions sharing the same non-zero values
// in this field formulates a composite grouping expression (i.e., those
// columns considered as one grouping expression).
// For GROUNPING_EXPANSION_UNSPECIFIED, this field is ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Typo: GROUNPING_EXPANSION_UNSPECIFIED


If at least one grouping expression is present, the aggregation is allowed to not have any aggregate expressions. An aggregate relation is invalid if it would yield zero columns.

Optionally, a grouping set can be specified with rollup or cube expansions rather than fully spelling out the grouping sets. When either rollup or cube is specified for a grouping set, permutations of grouping expressions will be generated accordingly. Given `n` grouping expressions for a grouping set, ROLLUP expands to n + 1 grouping sets. CUBE expands to 2^`n` grouping sets (i.e., all subsets of the `n` grouping expressions). Given `i`th grouping set with ROLLUP expansion, the indicator values are assigned from `i` (i.e., an empty grouping) to `i + n + 1` (i.e., the grouping as if no expansion was done). For CUBE, the indicator values are assigned from `i` (i.e., an empty grouping) to `i + 2^n` (i.e., the grouping as if no expanson was done) where the values are assigned as `n`-bit integer where the bit is set when a grouping expression at corresponding position is being grouped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some concrete examples in a table or similar here?


enum GroupingExpansion {
// No grouping expansion. Default.
GROUPING_EXPANSION_UNSPECIFIED = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in other locations NONE is specified separately from UNSPECIFICED. Can you confirm? If so, we should match that pattern here.

// (3, 1), (4), (5, 2),
// (3, 1, 4), (3, 1, 5, 2), (4, 5, 2),
// (3, 1, 4, 5, 2)
repeated uint32 composite_grouping_expressions = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to create an alternative representation rather than the double array representation. This feels a bit complicated.

As part of that, it might be best if it also includes the indiciator value explicitly rather than the more complex pattern you describe below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this is proposal, not a demand. What are you thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was never happy with this representation but just put out here to get some feedback. :-) The reason why I chose to this was because of the abundant usage of index/offset in other places. If this feels complex to you, we should make it more explicit, perhaps, use oneof and more descriptive message to encode these albeit it will introduce another level of nesting.

I don't know whether we will like it but I will draft that. Let's see how we like it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacques-n BTW, could you elaborate the indicator value in your comment? You mean the indicator value used to bundle (i'll use different word as "group" is very overloaded here) the grouping columns (i.e., the values in the composite_grouping_expressions)? Or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants