Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix configuration handling during MergeRollupTask execution #14856

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

davecromberge
Copy link
Member

Aggregation function parameters and dimensions to erase were being extracted directly from the task configuration without prepending the merge level key.

For example, the task config is encoded by the Task generator as follows:

   "hourly.keyA":  "some value",
   "hourly.keyB": "some other value",
   "mergeLevel": "hourly"

Any lookups on the configuration during task execution have to include the mergeLevel prefix in order to resolve lookups correctly.

There were three options in trying to address this bugfix:

  1. Change the encoding of the task config - TaskGenerator would need to strip out the merge level prefix and thus make it easier for key extraction. Problem is that this could break existing functionality.
  2. Re-encode all keys without the prefix in the task config. This would bloat the config.
  3. Include the merge level prefix in any key lookups - this approach is followed in this PR.

This PR should be tagged with bugfix

@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 96.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 63.70%. Comparing base (59551e4) to head (39a187f).
Report is 1643 commits behind head on master.

Files with missing lines Patch % Lines
...minion/tasks/mergerollup/MergeRollupTaskUtils.java 95.65% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14856      +/-   ##
============================================
+ Coverage     61.75%   63.70%   +1.94%     
- Complexity      207     1473    +1266     
============================================
  Files          2436     2709     +273     
  Lines        133233   151885   +18652     
  Branches      20636    23454    +2818     
============================================
+ Hits          82274    96751   +14477     
- Misses        44911    47862    +2951     
- Partials       6048     7272    +1224     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.67% <96.00%> (+1.96%) ⬆️
java-21 63.59% <96.00%> (+1.96%) ⬆️
skip-bytebuffers-false 63.68% <96.00%> (+1.93%) ⬆️
skip-bytebuffers-true 63.57% <96.00%> (+35.84%) ⬆️
temurin 63.70% <96.00%> (+1.94%) ⬆️
unittests 63.69% <96.00%> (+1.94%) ⬆️
unittests1 56.21% <ø> (+9.32%) ⬆️
unittests2 34.03% <96.00%> (+6.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* @param taskConfig the current merge rollup task configuration used for sourcing the merge level.
* @return composite lookup key if the merge level is configured. Otherwise, return original key.
*/
public static String buildMergeLevelKeyPrefix(String key, Map<String, String> taskConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this have been an issue with existing merge rollup without your changes as well ?
I guess your changes add a new functionality to perform custom transformation based on merge level and thats where this bug is surfaced?

"Any lookups on the configuration during task execution have to include the mergeLevel prefix in order to resolve lookups correctly."

Copy link
Member Author

@davecromberge davecromberge Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swaminathanmanish From a cursory reading of the existing code, the required keys (during execution) are computed during task generation and inserted into the config without the prefix:
https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/MergeRollupTaskGenerator.java#L711

I can't find a use case in the existing merge rollup execution where the merge level prefixed keys are required. I considered stripping the prefix as noted in the PR description, but this would merely result in re-inserting the same keys without the prefix. Not all values are necessarily recomputed and many remain the same. This also might have broken existing use cases that might expect the presence of the prefix.

I guess your changes add a new functionality to perform custom transformation based on merge level and thats where this bug is surfaced?

Correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David for the explanation. Would it make sense to add some validation for taskConfig in the task generator (throw exception if invalid) to make sure that aggregation configs are valid, so that the user knows if there are uses during task generation itself. Otherwise we would have to figure out why a particular aggregation did not happen.

* @param taskConfig the current merge rollup task configuration used for sourcing the merge level.
* @return composite lookup key if the merge level is configured. Otherwise, return original key.
*/
public static String buildMergeLevelKeyPrefix(String key, Map<String, String> taskConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David for the explanation. Would it make sense to add some validation for taskConfig in the task generator (throw exception if invalid) to make sure that aggregation configs are valid, so that the user knows if there are uses during task generation itself. Otherwise we would have to figure out why a particular aggregation did not happen.

@davecromberge davecromberge force-pushed the fix/merge-rollup-task-config-extraction branch from 5bc1434 to d7be313 Compare January 28, 2025 13:35
Aggregation function parameters and dimensions to erase were being
extracted directly from the task configuration without prepending the
merge level key.
@davecromberge davecromberge force-pushed the fix/merge-rollup-task-config-extraction branch from d7be313 to 39a187f Compare January 29, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants