Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

track newly added segments for upsert tables for a more complete upsert data view #13992

Conversation

klsince
Copy link
Contributor

@klsince klsince commented Sep 12, 2024

When uploading a segment, the server may process it ahead of broker adding it to routing table. Broker listens to IS changes to add the newly added segments into routing table, which may take a few seconds or many seconds, depending on the load of the broker.

If query gets routed by broker during this short window, broker may not include the new segment in the list of selected segments for the query, although the server already starts to process the new segment and invalidate the docs in existing segments. Due to this race condition, the query may access less than expected valid docs.

In this PR, we tracks newly uploaded segments on server side for a configurable period, so that server can ask query to access them as optional segments. Optional segment is a mechanism we added earlier on to handle a similar race condition for consuming segment.

This PR adds a new config newSegmentTrackingTimeMs to enable this feature, essentially as a best effort to wait for broker to add a segment into their routing table upon IS changes. By default, it's 10s to enable this tracking feature.

@klsince klsince force-pushed the handle_uploaded_segments_for_upsert_consistency_mode branch from 2ee3554 to 42019ca Compare September 12, 2024 22:01
@Jackie-Jiang Jackie-Jiang added documentation Configuration Config changes (addition/deletion/change in behavior) labels Sep 12, 2024
@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 55.07246% with 31 lines in your changes missing coverage. Please review.

Project coverage is 65.04%. Comparing base (59551e4) to head (66b07e0).
Report is 1049 commits behind head on master.

Files with missing lines Patch % Lines
...core/query/executor/ServerQueryExecutorV1Impl.java 0.00% 15 Missing and 1 partial ⚠️
...he/pinot/segment/local/utils/TableConfigUtils.java 61.53% 3 Missing and 2 partials ⚠️
...psert/ConcurrentMapTableUpsertMetadataManager.java 42.85% 4 Missing ⚠️
...ata/manager/realtime/RealtimeTableDataManager.java 0.00% 3 Missing ⚠️
...cal/upsert/BasePartitionUpsertMetadataManager.java 80.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13992      +/-   ##
============================================
+ Coverage     61.75%   65.04%   +3.28%     
- Complexity      207     1534    +1327     
============================================
  Files          2436     2538     +102     
  Lines        133233   140268    +7035     
  Branches      20636    21562     +926     
============================================
+ Hits          82274    91231    +8957     
+ Misses        44911    42309    -2602     
- Partials       6048     6728     +680     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 65.02% <55.07%> (+3.31%) ⬆️
java-21 64.92% <55.07%> (+3.29%) ⬆️
skip-bytebuffers-false 65.03% <55.07%> (+3.28%) ⬆️
skip-bytebuffers-true 64.90% <55.07%> (+37.17%) ⬆️
temurin 65.04% <55.07%> (+3.28%) ⬆️
unittests 65.03% <55.07%> (+3.28%) ⬆️
unittests1 56.46% <13.04%> (+9.57%) ⬆️
unittests2 35.09% <55.07%> (+7.35%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion, suggest making the following changes:

  • Make the config something like newSegmentTrackingTimeMs because optional is internal detail but not user friendly
  • Remove clean-up interval, and cleanup synchronously when query comes in
  • Decouple it from UpsertViewManager, and we can treat non-positive tracking time as disabling the feature

@klsince klsince force-pushed the handle_uploaded_segments_for_upsert_consistency_mode branch from 42019ca to e92ec54 Compare September 13, 2024 21:59
@klsince klsince changed the title mark/unmark segments being added as optional for consistent upsert view track newly added segments for upsert tables for a more complete upsert data view Sep 13, 2024
…s upsert table even not using consistency mode
@klsince klsince force-pushed the handle_uploaded_segments_for_upsert_consistency_mode branch from e92ec54 to 158a43b Compare September 13, 2024 22:04
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one non-blocking comment

@klsince klsince force-pushed the handle_uploaded_segments_for_upsert_consistency_mode branch from aa04b64 to f007783 Compare September 17, 2024 22:54
@klsince klsince force-pushed the handle_uploaded_segments_for_upsert_consistency_mode branch from f007783 to d2eda3b Compare September 17, 2024 22:58
@klsince klsince merged commit dbd0f3b into apache:master Sep 18, 2024
21 checks passed
@klsince klsince deleted the handle_uploaded_segments_for_upsert_consistency_mode branch September 18, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation enhancement upsert
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants