Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config for max output segment size in UpsertCompactMerge task #14742

Merged

Conversation

tibrewalpratik17
Copy link
Contributor

Resolves #14634

Adding support to configure max output segment size for UpsertCompactMerge task.

@tibrewalpratik17 tibrewalpratik17 added feature Configuration Config changes (addition/deletion/change in behavior) upsert minion labels Jan 2, 2025
@tibrewalpratik17 tibrewalpratik17 force-pushed the add_segment_size_config branch from 75d14c6 to 66e9208 Compare January 2, 2025 14:26
@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 7.40741% with 25 lines in your changes missing coverage. Please review.

Project coverage is 63.82%. Comparing base (59551e4) to head (6b42ef0).
Report is 1528 commits behind head on master.

Files with missing lines Patch % Lines
...tcompactmerge/UpsertCompactMergeTaskGenerator.java 7.40% 24 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14742      +/-   ##
============================================
+ Coverage     61.75%   63.82%   +2.07%     
- Complexity      207     1609    +1402     
============================================
  Files          2436     2703     +267     
  Lines        133233   150748   +17515     
  Branches      20636    23291    +2655     
============================================
+ Hits          82274    96219   +13945     
- Misses        44911    47332    +2421     
- Partials       6048     7197    +1149     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.78% <7.40%> (+2.07%) ⬆️
java-21 63.71% <7.40%> (+2.09%) ⬆️
skip-bytebuffers-false 63.82% <7.40%> (+2.07%) ⬆️
skip-bytebuffers-true 63.67% <7.40%> (+35.95%) ⬆️
temurin 63.82% <7.40%> (+2.07%) ⬆️
unittests 63.82% <7.40%> (+2.07%) ⬆️
unittests1 56.24% <ø> (+9.35%) ⬆️
unittests2 34.17% <7.40%> (+6.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review January 2, 2025 21:07
// Add the segment to the current group
currentGroup.add(segment);
currentValidDocsSum += validDocs;
currentTotalDocsSum += validDocs + invalidDocs;
currentOutputSegmentSizeInBytes += expectedSegmentSizeInBytes;
Copy link
Contributor

@swaminathanmanish swaminathanmanish Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ensure that we are packing enough segments (based on outputSegmentMaxSizeInBytes) per task. If so, this still does not ensure that a segment is of expectedSegmentSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ensure that we are packing enough segments (based on outputSegmentMaxSizeInBytes) per task. If so, this still does not ensure that a segment is of expectedSegmentSize?

This ensures that the output segment size is within a certain threshold. We have named the config "outputSegmentMaxSize".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tibrewalpratik17 . By default do we want to have 200Mb segments? This changes the default segment size and can overide maxNumRecordsPerSegment. We can make the new param opt-in and not apply default?

&& currentTotalDocsSum + validDocs + invalidDocs < maxRecordsPerTask
&& currentOutputSegmentSizeInBytes + expectedSegmentSizeInBytes < outputSegmentMaxSizeInBytes) {

@tibrewalpratik17 tibrewalpratik17 merged commit 4588f8c into apache:master Jan 7, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) feature minion upsert
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for size based minion UpsertCompactMerge task
4 participants