[timeseries] Remove Time Series Specific Code from V1 Engine #14841

ankitsultana · 2025-01-19T04:35:57Z

Description

Removes Time Series specific code from Pinot's V1 Engine, and instead leverages a Transform function and an Aggregation function to implement the leaf stage for Time Series queries.

This helps us take advantage of the speed that GroupByOperator and AggregationOperator, while at the same time, allows Time Series languages to define how they want their series to built.

This also greatly simplifies the Time Series code, as can be seen by the fact that this PR removes a ton of code.

Series Limit Handling

Series limit are now handled via standard Group By limits as defined here.

API Changes

BaseTimeSeriesBuilder has a new method called buildWithTagOverrides. This is because the GroupByOperator materializes Group values lazily to avoid excessive allocations.

Testing

Updated existing unit tests, tested on Quickstart, and also tested on our cluster. Long range dashboards are finally loading with V1 Engine speeds.

Future Work

In the next PR I'll add support for accepting series limit per-request, since callers would want to set them based on the number of time buckets that users have in their query.
I'll also add support for returning warnings in a TimeSeriesBlock, and propagate the group limit reached warning to the caller.
In the next few months we'll also add support for Exemplars

codecov-commenter · 2025-01-19T05:13:27Z

Codecov Report

Attention: Patch coverage is 64.86486% with 78 lines in your changes missing coverage. Please review.

Project coverage is 63.76%. Comparing base (59551e4) to head (ae24178).
Report is 1594 commits behind head on master.

Files with missing lines	Patch %	Lines
...gation/function/TimeSeriesAggregationFunction.java	66.66%	30 Missing and 3 partials ⚠️
...ery/runtime/timeseries/LeafTimeSeriesOperator.java	0.00%	14 Missing ⚠️
...c/main/java/org/apache/pinot/tsdb/spi/AggInfo.java	23.52%	11 Missing and 2 partials ⚠️
...e/operator/timeseries/TimeSeriesOperatorUtils.java	77.77%	4 Missing and 4 partials ⚠️
...rm/function/TimeSeriesBucketTransformFunction.java	87.50%	4 Missing ⚠️
...untime/timeseries/TimeSeriesPhysicalTableScan.java	0.00%	3 Missing ⚠️
...imeseries/PhysicalTimeSeriesServerPlanVisitor.java	71.42%	1 Missing and 1 partial ⚠️
...e/pinot/tsdb/spi/series/BaseTimeSeriesBuilder.java	66.66%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14841      +/-   ##
============================================
+ Coverage     61.75%   63.76%   +2.01%     
- Complexity      207     1611    +1404     
============================================
  Files          2436     2703     +267     
  Lines        133233   151188   +17955     
  Branches      20636    23341    +2705     
============================================
+ Hits          82274    96402   +14128     
- Misses        44911    47551    +2640     
- Partials       6048     7235    +1187

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.71% <64.86%> (+2.00%)`	⬆️
java-21	`63.65% <64.86%> (+2.02%)`	⬆️
skip-bytebuffers-false	`63.73% <64.86%> (+1.98%)`	⬆️
skip-bytebuffers-true	`63.62% <64.86%> (+35.89%)`	⬆️
temurin	`63.76% <64.86%> (+2.01%)`	⬆️
unittests	`63.75% <64.86%> (+2.01%)`	⬆️
unittests1	`56.31% <64.86%> (+9.42%)`	⬆️
unittests2	`34.06% <1.35%> (+6.33%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ankitsultana · 2025-01-19T18:15:57Z

...not-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/BaseTimeSeriesBuilder.java

+  /**
+   * Used by the leaf stage, because the leaf stage materializes tag values very late.
+   */
+  public abstract TimeSeries buildWithTagOverrides(List<String> tagNames, Object[] tagValues);


@raghavyadav01 : this one change will be required in series builders. This should do the build, AND use the provided tag names and values for the new series. See other builders in this PR for reference.

ankitsultana · 2025-01-26T17:34:05Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/AggregationFunctionType.java

@@ -220,7 +220,8 @@ public enum AggregationFunctionType {
      SqlTypeName.OTHER),
  PERCENTILERAWKLLMV("percentileRawKLLMV", ReturnTypes.VARCHAR,
      OperandTypes.family(List.of(SqlTypeFamily.ARRAY, SqlTypeFamily.NUMERIC, SqlTypeFamily.INTEGER), i -> i == 2),
-      SqlTypeName.OTHER);
+      SqlTypeName.OTHER),
+  TIMESERIESAGGREGATE("timeSeriesAggregate", SqlTypeName.OTHER, SqlTypeName.VARCHAR);


self-review: third arg (varchar) is wrong.

pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java

raghavyadav01 · 2025-01-21T18:36:00Z

...ava/org/apache/pinot/core/operator/transform/function/TimeSeriesBucketTransformFunction.java

+  }
+
+  @Override
+  public double[] transformToDoubleValuesSV(ValueBlock valueBlock) {


Should this also be implemented Values are stored as double ?

This is for the array that represents the index in the TimeBuckets that a given record maps to, so only supporting int so far.

tibrewalpratik17 · 2025-01-28T18:34:53Z

pinot-core/src/main/java/org/apache/pinot/core/operator/timeseries/TimeSeriesOperatorUtils.java

+    if (groupByResultsBlock.getNumRows() == 0) {
+      return new TimeSeriesBlock(timeBuckets, new HashMap<>());
+    }
+    if (groupByResultsBlock.isNumGroupsLimitReached()) {


we don't want to allow partial results here is it?

Good catch. I am updating this in the next PR. Scope of next PR is:

Take in API parameter to control series limit. We will integrate that with V1 Engine via the numGroupsLimit config.

Don't throw error on group limit reached.

note that throwing on series limit is same as the behavior before this PR. I didn't want to change it in this PR because I wanted to test it separately.

tibrewalpratik17 · 2025-01-28T18:38:03Z

pinot-core/src/main/java/org/apache/pinot/core/operator/timeseries/TimeSeriesOperatorUtils.java

+    while (recordIterator.hasNext()) {
+      Record record = recordIterator.next();
+      Object[] recordValues = record.getValues();
+      Object[] tagValues = new Object[recordValues.length - 1];


nit: we can keep this as String array?

Good catch again! The issue is that I had made tag values Object[] in TimeSeries. I wanted to change that to String[], which will lead to a change everywhere. Will take it up in the next few PRs. (also need to remove TimeSeries#id)

tibrewalpratik17 · 2025-01-28T18:39:47Z

...ava/org/apache/pinot/core/operator/transform/function/TimeSeriesBucketTransformFunction.java

+  }
+
+  @Override
+  public long[] transformToLongValuesSV(ValueBlock valueBlock) {


need TODOs to support these in future?

For now I don't want to support this since this function is not exactly intended for SQL, and I want to make sure that we use the int based code-path throughout.

raghavyadav01 · 2025-01-28T18:34:53Z

pinot-core/src/main/java/org/apache/pinot/core/operator/timeseries/TimeSeriesOperatorUtils.java

+      Object[] recordValues = record.getValues();
+      Object[] tagValues = new Object[recordValues.length - 1];
+      for (int index = 0; index + 1 < recordValues.length; index++) {
+        tagValues[index] = recordValues[index] == null ? "null" : recordValues[index].toString();


Is "null " correct here?

Yeah. For tag values we don't want actual nulls. Also just a heads up that I'll change Object[] tagValues to String[] tagValues in future PR

raghavyadav01 · 2025-01-28T18:38:31Z

...ain/java/org/apache/pinot/core/query/aggregation/function/TimeSeriesAggregationFunction.java

+ * Aggregation function used by the Time Series Engine.
+ * TODO: This can't be used with SQL because the Object Serde is not implemented.
+ */
+public class TimeSeriesAggregationFunction implements AggregationFunction<BaseTimeSeriesBuilder, DoubleArrayList> {


We discussed sometime back that we would also provide raw timestamp for each value. Do we have that change?

Will add it in the PR after this.

[timeseries] Remove Time Series Specific Code from V1 Engine

a89a328

ankitsultana added the timeseries-engine Tracking tag for generic time-series engine work label Jan 19, 2025

ankitsultana mentioned this pull request Jan 19, 2025

[WIP] [PoC] Remove Time Series Specific Logic from V1 Engine #14558

Closed

ankitsultana added 2 commits January 19, 2025 05:26

self-review

5950b1d

fix checkstyle

ae24178

ankitsultana commented Jan 19, 2025

View reviewed changes

ankitsultana marked this pull request as ready for review January 26, 2025 03:11

ankitsultana commented Jan 26, 2025

View reviewed changes

raghavyadav01 reviewed Jan 28, 2025

View reviewed changes

tibrewalpratik17 reviewed Jan 28, 2025

View reviewed changes

raghavyadav01 approved these changes Jan 28, 2025

View reviewed changes

Jackie-Jiang approved these changes Jan 28, 2025

View reviewed changes

tibrewalpratik17 approved these changes Jan 28, 2025

View reviewed changes

ankitsultana merged commit b6b2fd9 into apache:master Jan 28, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[timeseries] Remove Time Series Specific Code from V1 Engine #14841

[timeseries] Remove Time Series Specific Code from V1 Engine #14841

ankitsultana commented Jan 19, 2025 •

edited

Loading

codecov-commenter commented Jan 19, 2025 •

edited

Loading

ankitsultana Jan 19, 2025

ankitsultana Jan 26, 2025

raghavyadav01 Jan 21, 2025

ankitsultana Jan 28, 2025

tibrewalpratik17 Jan 28, 2025

ankitsultana Jan 28, 2025

tibrewalpratik17 Jan 28, 2025

ankitsultana Jan 28, 2025

tibrewalpratik17 Jan 28, 2025

ankitsultana Jan 28, 2025

raghavyadav01 Jan 28, 2025

ankitsultana Jan 28, 2025

raghavyadav01 Jan 28, 2025

ankitsultana Jan 28, 2025

[timeseries] Remove Time Series Specific Code from V1 Engine #14841

[timeseries] Remove Time Series Specific Code from V1 Engine #14841

Conversation

ankitsultana commented Jan 19, 2025 • edited Loading

Description

Series Limit Handling

API Changes

Testing

Future Work

codecov-commenter commented Jan 19, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankitsultana commented Jan 19, 2025 •

edited

Loading

codecov-commenter commented Jan 19, 2025 •

edited

Loading