Timestamp in MSE #14690

gortiz · 2024-12-20T11:03:55Z

This PR adds the ability to use timestamp indexes in MSE. As we know, timestamp indexes are not actual indexes but syntactic sugar. When a timestamp index is created, a set of granularities have to be defined. Then when segments are created, a new column is created for each granularity. The content of each column is equal to datetrunc(GRANULARITY, originalColumn).

Then in SSE, the broker enriches queries adding possible overrides on the query. This means that for each usage of datetrunc(GRANULARITY, originalColumn), the broker suggest to transform the call to the generated column $originalColumn$GRANULARITY. Whether that suggestion is applied or not is defined by the server.

In MSE these suggested rewrites were not populated, which means that servers never applied the rewrite.

Specifically, queries like:

select AirTime from airlineStats where DATETRUNC('day', ts) > '2014-01-02 01:00:00.0' limit 10

Can now be optimized.

While creating some tests that verify this PR I've found that cases like

SELECT sum(case when datetrunc('SECOND',ArrTime) > 1 then 2 else 0 end) FROM mytable

were not optimized. Therefore this PR modifies InstancePlanMakerImplV2 to apply the rewrite in that case as well.

… into timestamp-v2

gortiz · 2024-12-20T11:04:04Z

cc @bziobrowski

codecov-commenter · 2024-12-20T11:48:04Z

Codecov Report

Attention: Patch coverage is 85.41667% with 7 lines in your changes missing coverage. Please review.

Project coverage is 63.88%. Comparing base (59551e4) to head (86aa3e8).
Report is 1539 commits behind head on master.

Files with missing lines	Patch %	Lines
...pache/pinot/common/utils/request/RequestUtils.java	82.35%	0 Missing and 3 partials ⚠️
...pinot/core/plan/maker/InstancePlanMakerImplV2.java	66.66%	2 Missing and 1 partial ⚠️
...sthandler/BaseSingleStageBrokerRequestHandler.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14690      +/-   ##
============================================
+ Coverage     61.75%   63.88%   +2.12%     
- Complexity      207     1607    +1400     
============================================
  Files          2436     2703     +267     
  Lines        133233   150771   +17538     
  Branches      20636    23296    +2660     
============================================
+ Hits          82274    96314   +14040     
- Misses        44911    47259    +2348     
- Partials       6048     7198    +1150

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.84% <85.41%> (+2.13%)`	⬆️
java-21	`63.76% <85.41%> (+2.13%)`	⬆️
skip-bytebuffers-false	`63.88% <85.41%> (+2.13%)`	⬆️
skip-bytebuffers-true	`63.71% <85.41%> (+35.98%)`	⬆️
temurin	`63.88% <85.41%> (+2.12%)`	⬆️
unittests	`63.87% <85.41%> (+2.12%)`	⬆️
unittests1	`56.32% <87.23%> (+9.43%)`	⬆️
unittests2	`34.15% <0.00%> (+6.42%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bziobrowski · 2024-12-20T13:32:56Z

...ration-tests/src/test/java/org/apache/pinot/integration/tests/custom/TimestampIndexTest.java

+            + "            AggregateFiltered(aggregations=[[sum('2'), count(*)]])\n"
+            + "              Transform(expressions=[['2']])\n"
+            + "                Project(columns=[[]])\n"
+            + "                  DocIdSet(maxDocs=[120000])\n"


Would it be possible to go over 120 chars limit here with //@Formatter:off ?

I don't think so, but I didn't try

bziobrowski · 2024-12-20T13:43:42Z

...ration-tests/src/test/java/org/apache/pinot/integration/tests/custom/TimestampIndexTest.java

+            + "                FilterRangeIndex(predicate=[$ArrTime$SECOND > '1'], "
+            + "indexLookUp=[range_index], operator=[RANGE])\n");
+  }
+


Would it be worth checking queries with :

GROUP BY datetrunc('SECOND',ArrTime)

HAVING datetrunc('SECOND',ArrTime) > 1

HAVING sum(case when datetrunc('SECOND',ArrTime) > 1 then 2 else 0 end) > 1

UNION-ed second query ?

WHERE EXISTS (select * from tab t2 where datetrunc('SECOND',ArrTime) > 1 )

WHERE EXISTS (select * from tab t2 where t1.x = datetrunc('SECOND',t2.ArrTime) > 1 ) )
?

GROUP BY datetrunc('SECOND',ArrTime)

Added

HAVING datetrunc('SECOND',ArrTime) > 1
HAVING sum(case when datetrunc('SECOND',ArrTime) > 1 then 2 else 0 end) > 1

In these cases the optimization is not supported in SSE and it isn't easy to support it. The main problem here is that InstancePlanMakerImplV2.rewriteQueryContextWithHints rewrites the query using a dirty trick: it modifies the lists returned by queryContext.getSelectExpressions(), queryContext.getGroupByExpressions(), etc. In the queryContext.getAggregationFunctions() case, it returns a list of AggregationFunction which have their own AggregationFunction.getInputExpressions(). But it is not guaranteed that modifying the list returned by that method actually modifies the AggregationFunction. In fact in cases most cases it doesn't.

UNION-ed second query ?

Unions don't have expressions right? In case one of the unioned-queries have expressions, it should be treated by the other cases using recursion.

This tests class is testing more than this feature. It is testing that the substitution is being done, which means that the overrideHint is added (what this PR changes) and that the override is actually applied (what is failing in some cases). We could create a test that just verifies that the overrideHint is added, but that may be a complex test to add, so I think the tests we have right now are good enough.

WHERE EXISTS (select * from tab t2 where datetrunc('SECOND',ArrTime) > 1 )
WHERE EXISTS (select * from tab t2 where t1.x = datetrunc('SECOND',t2.ArrTime) > 1 ) )

This is tricky. Would we want to test all possible SQL constructors? In theory... maybe... But I don't think it is worth it. We know we are applying the hint with a visitor that walks through the relational nodes. Therefore we know we will visit the sub-query filters. Which means that it is the same case as a select in the main query.

Obviously that is a grey area. A future refactor may not use a visitor and therefore may end up not visiting the inner query. But I don't think it is worth to test so deep. Probably, as said before, a lower level test (ie a unit test on the visitor) should make it more explicit why that case isn't worth to be tested, but here I'm using an integration test to test the overrideHint logic, which isn't great but is easier than testing ServerPlanRequestVisitor

I checked HAVING on a few queries locally and think it's fine because HAVING expression is applied after aggregation by matching expression to group by output (so no additional aggregates are computed if it matches group by element).

Regarding UNION and WHERE EXISTS - I was thinking of checking if rule applies to all subqueries. As you mentioned - it's likely better to test on a lower level, for all rules/types .

bziobrowski · 2024-12-20T14:04:06Z

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/RexExpression.java

@@ -31,6 +31,8 @@
 */
 public interface RexExpression {



Are these methods used anywhere ?

In fact they are not. Having the visitor for the future may be useful, but it is also very simple to add again, so I'm removing it

bziobrowski

👍

yashmayya

I think it'd be useful to also have an actual integration test to verify the query execution path (and not just the explain plan output).

...rc/main/java/org/apache/pinot/broker/requesthandler/BaseSingleStageBrokerRequestHandler.java

pinot-common/src/main/java/org/apache/pinot/common/utils/request/RequestUtils.java

...n-test-base/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTest.java

yashmayya · 2024-12-26T06:30:33Z

pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java

+    List<Pair<AggregationFunction, FilterContext>> filtAggrFuns = queryContext.getFilteredAggregationFunctions();
+    if (filtAggrFuns != null) {
+      for (Pair<AggregationFunction, FilterContext> filteredAggregationFunction : filtAggrFuns) {
+        FilterContext right = filteredAggregationFunction.getRight();
+        if (right != null) {
+          Predicate predicate = right.getPredicate();
+          predicate.setLhs(overrideWithExpressionHints(predicate.getLhs(), indexSegment, expressionOverrideHints));
+        }
+      }
+    }


Could we add an integration test that verifies this filtered aggregation on timestamp column with timestamp index execution path on both the query engines?

test timestampIndexSubstitutedInAggregateFilter already tested this code in MME. I've just added the same test class for SSE.

yashmayya · 2024-12-26T06:53:32Z

...ntime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestVisitor.java

+  private void applyTimestampIndex(Expression expression, PinotQuery pinotQuery) {
+    RequestUtils.applyTimestampIndex(expression, pinotQuery);
+    Function functionCall = expression.getFunctionCall();
+    if (expression.isSetFunctionCall()) {
+      for (Expression operand : functionCall.getOperands()) {
+        applyTimestampIndex(operand, pinotQuery);
+      }
+    }
+  }


So here we're adding the expression override hints regardless of the actual timestamp index configurations and relying on the server to only apply the correct ones?

Yes, it is a bit strange. That is the way V1 does it. I don't completely understand why we do that on brokers instead of doing that on servers depending on whether we have the column + index or not, the same way we do that with other indexes.

Yes, it is a bit strange. That is the way V1 does it

In v1, the broker only adds the expression override hints if there is an appropriately configured timestamp index right?

pinot/pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseSingleStageBrokerRequestHandler.java

Lines 950 to 953 in 397fd98

if (timestampIndexColumns.contains(timeColumnWithGranularity)) {

pinotQuery.putToExpressionOverrideHints(expression,

RequestUtils.getIdentifierExpression(timeColumnWithGranularity));

}

Answering my own question, in v2 the expression override won't be applied on the server side due to the check here -

pinot/pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java

Lines 395 to 396 in 0e915ed

if (overrideExpression != null && overrideExpression.getIdentifier() != null && indexSegment.getColumnNames()

.contains(overrideExpression.getIdentifier())) {

I suppose this isn't ideal since in v1 we're only adding the expression override hint to the query when there is an appropriate timestamp index configured, but I've verified that things work as expected regardless due to the server side checks.

...tion-tests/src/test/java/org/apache/pinot/integration/tests/ExplainIntegrationTestTrait.java

…pIndexOverrideHints

gortiz · 2024-12-31T13:33:22Z

I think all discussions have been resolved or are waiting for further comments. Yash, feel free to take a second look

gortiz · 2025-01-02T08:09:44Z

It looks like the new sse test if failing during setup. I'm having troubles to understand why given the setup process is identical in mse.

yashmayya · 2025-01-02T11:12:49Z

It looks like the new sse test if failing during setup. I'm having troubles to understand why given the setup process is identical in mse.

Yeah that's quite odd - the new tests pass locally for me.

gortiz · 2025-01-03T11:34:23Z

I was able to reproduce the error in the test. It looks like some other test (probably the MSE one) id modifying the config and leaving it in an incorrect state for the SSE test

Change the tests to return a mutable list instead.

gortiz · 2025-01-03T12:13:21Z

I was correct. MSE and SSE tests were interfering with each other. The cause was the change I made to make default columns mutable, which was incorrect. First MSE was adding the new range indexes and then SSE wanted to add them again.

Instead of changing a static reference, which is never a good idea, I'm changing the code so all tests return a new ArrayList for these columns, so each test can modify them independently. Changing the columns is not great, but that is how these features have been implemented and isn't easy to change that.

yashmayya

LGTM. Checkstyle is currently failing due to an unused import in BaseClusterIntegrationTest though.

...tion-tests/src/test/java/org/apache/pinot/integration/tests/ExplainIntegrationTestTrait.java

gortiz added 4 commits February 14, 2024 15:29

Attempt to apply timestamp expr override in V2

ac48f4b

Merge remote-tracking branch 'personal/timestamp-expr-override-in-v2'…

98d424c

… into timestamp-v2

Apply TimestampIndex trick in filtered aggregation functions

7c1d5ed

Change the way ServerPlanRequestVisitor finds dateTrunc function calls

1338c15

gortiz requested review from Jackie-Jiang and yashmayya December 20, 2024 11:03

gortiz force-pushed the timestamp-v2 branch from 71c958e to bc09186 Compare December 20, 2024 12:42

bziobrowski reviewed Dec 20, 2024

View reviewed changes

Add tests that verify timestamp indexes can be used in MSE

4689d87

gortiz force-pushed the timestamp-v2 branch from bc09186 to 4689d87 Compare December 20, 2024 13:47

bziobrowski reviewed Dec 20, 2024

View reviewed changes

gortiz added 2 commits December 20, 2024 15:39

Add a group by test

a04a9f5

Remove unused visitor

6879cc4

bziobrowski approved these changes Dec 20, 2024

View reviewed changes

yashmayya reviewed Dec 26, 2024

View reviewed changes

gortiz added 4 commits December 30, 2024 11:00

Use TransformFunctionType.DATE_TRUNC.getName() instead of the literal

0277460

Rename RequestUtils.applyTimestampIndex as RequestUtils.applyTimestam…

6dc210f

…pIndexOverrideHints

Merge remote-tracking branch 'origin/master' into timestamp-v2

1cd3dfd

Add timestamp tests for SSE

005ba05

gortiz added 2 commits January 3, 2025 13:09

Make default columns immutable again

08aba97

Change the tests to return a mutable list instead.

Add the ability to write sse explain tests as regexps

3e32e61

yashmayya approved these changes Jan 6, 2025

View reviewed changes

...tion-tests/src/test/java/org/apache/pinot/integration/tests/ExplainIntegrationTestTrait.java Outdated Show resolved Hide resolved

gortiz added 2 commits January 7, 2025 08:39

Improve assertion messages

790a9ae

Resolve checkstyle issue

86aa3e8

gortiz merged commit 6e9a70f into apache:master Jan 7, 2025
20 of 21 checks passed

gortiz deleted the timestamp-v2 branch January 7, 2025 09:26

Jackie-Jiang added multi-stage Related to the multi-stage query engine enhancement labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamp in MSE #14690

Timestamp in MSE #14690

gortiz commented Dec 20, 2024

gortiz commented Dec 20, 2024

codecov-commenter commented Dec 20, 2024 •

edited

Loading

bziobrowski Dec 20, 2024 •

edited

Loading

gortiz Dec 20, 2024

bziobrowski Dec 20, 2024 •

edited

Loading

gortiz Dec 20, 2024

gortiz Dec 20, 2024 •

edited

Loading

bziobrowski Dec 20, 2024 •

edited

Loading

bziobrowski Dec 20, 2024

gortiz Dec 20, 2024

bziobrowski left a comment

yashmayya left a comment

yashmayya Dec 26, 2024

gortiz Dec 31, 2024

yashmayya Dec 26, 2024

gortiz Dec 30, 2024

yashmayya Jan 2, 2025

yashmayya Jan 6, 2025

gortiz commented Dec 31, 2024 •

edited

Loading

gortiz commented Jan 2, 2025

yashmayya commented Jan 2, 2025

gortiz commented Jan 3, 2025

gortiz commented Jan 3, 2025

yashmayya left a comment

	if (timestampIndexColumns.contains(timeColumnWithGranularity)) {
	pinotQuery.putToExpressionOverrideHints(expression,
	RequestUtils.getIdentifierExpression(timeColumnWithGranularity));
	}

	if (overrideExpression != null && overrideExpression.getIdentifier() != null && indexSegment.getColumnNames()
	.contains(overrideExpression.getIdentifier())) {

Timestamp in MSE #14690

Timestamp in MSE #14690

Conversation

gortiz commented Dec 20, 2024

gortiz commented Dec 20, 2024

codecov-commenter commented Dec 20, 2024 • edited Loading

Codecov Report

bziobrowski Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bziobrowski Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gortiz Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

bziobrowski Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bziobrowski left a comment

Choose a reason for hiding this comment

yashmayya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gortiz commented Dec 31, 2024 • edited Loading

gortiz commented Jan 2, 2025

yashmayya commented Jan 2, 2025

gortiz commented Jan 3, 2025

gortiz commented Jan 3, 2025

yashmayya left a comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 20, 2024 •

edited

Loading

bziobrowski Dec 20, 2024 •

edited

Loading

bziobrowski Dec 20, 2024 •

edited

Loading

gortiz Dec 20, 2024 •

edited

Loading

bziobrowski Dec 20, 2024 •

edited

Loading

gortiz commented Dec 31, 2024 •

edited

Loading