Skip to content

Fix and clean-up upsert and dedup config #15528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2025

Conversation

Jackie-Jiang
Copy link
Contributor

Bugfixes:

  • Introduce Enablement enum with value ENABLE, DISABLE and DEFAULT to control the enablement of a feature. For DEFAULT enablement, use the default config from upper level (e.g. instance level)
  • Introduce snapshot and preload field as Enablement into UpsertConfig and DedupConfig so that the value can be properly overridden. Currently there is no way to disable at table level when instance level is enabled
  • Always read properties from UpsertContext and DedupContext to avoid the inconsistency of server level override and config change

Cleanups:

  • Simplify the constructor for upsert/dedup related configs
  • Re-order some fields/methods for readability
  • Unify the metadata manager creation logic for upsert/dedup
  • Move some constants to CommonConstants

Incompatible:

  • Several method signatures are changed

Release note:

  • enableSnapshot and enablePreload are deprecated and replaced with snapshot and preload

@Jackie-Jiang Jackie-Jiang added documentation incompatible Indicate PR that introduces backward incompatibility release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) upsert dedup changes related to realtime ingestion dedup handling labels Apr 11, 2025
@codecov-commenter
Copy link

codecov-commenter commented Apr 12, 2025

Codecov Report

Attention: Patch coverage is 63.65979% with 141 lines in your changes missing coverage. Please review.

Project coverage is 62.82%. Comparing base (1fe72ae) to head (68e4a4b).

Files with missing lines Patch % Lines
...ache/pinot/segment/local/upsert/UpsertContext.java 54.38% 23 Missing and 3 partials ⚠️
...t/local/upsert/BaseTableUpsertMetadataManager.java 71.18% 11 Missing and 6 partials ⚠️
...apache/pinot/segment/local/dedup/DedupContext.java 53.12% 14 Missing and 1 partial ⚠️
...org/apache/pinot/spi/config/table/DedupConfig.java 48.27% 12 Missing and 3 partials ⚠️
...ealtime/writer/StatelessRealtimeSegmentWriter.java 0.00% 12 Missing ⚠️
...rg/apache/pinot/spi/config/table/UpsertConfig.java 79.62% 3 Missing and 8 partials ⚠️
...a/manager/realtime/RealtimeSegmentDataManager.java 64.28% 7 Missing and 3 partials ⚠️
...ata/manager/realtime/RealtimeTableDataManager.java 10.00% 5 Missing and 4 partials ⚠️
...ocal/upsert/TableUpsertMetadataManagerFactory.java 37.50% 2 Missing and 3 partials ⚠️
...in/java/org/apache/pinot/spi/utils/Enablement.java 64.28% 5 Missing ⚠️
... and 9 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15528      +/-   ##
============================================
- Coverage     62.85%   62.82%   -0.03%     
- Complexity     1383     1385       +2     
============================================
  Files          2863     2864       +1     
  Lines        162450   162551     +101     
  Branches      24881    24894      +13     
============================================
+ Hits         102101   102118      +17     
- Misses        52670    52743      +73     
- Partials       7679     7690      +11     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 62.79% <63.65%> (-0.02%) ⬇️
java-21 62.80% <63.65%> (-0.03%) ⬇️
skip-bytebuffers-false 62.81% <63.65%> (-0.04%) ⬇️
skip-bytebuffers-true 62.77% <63.65%> (+7.01%) ⬆️
temurin 62.82% <63.65%> (-0.03%) ⬇️
unittests 62.81% <63.65%> (-0.03%) ⬇️
unittests1 55.78% <39.89%> (-0.08%) ⬇️
unittests2 33.53% <58.24%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jackie-Jiang Jackie-Jiang force-pushed the fix_upsert_dedup_config branch from 7c8a6a0 to f14fb08 Compare April 14, 2025 19:17
@Jackie-Jiang Jackie-Jiang requested a review from klsince April 14, 2025 19:22
@Jackie-Jiang Jackie-Jiang force-pushed the fix_upsert_dedup_config branch 3 times, most recently from ca484ff to 46692b4 Compare April 15, 2025 00:06
if (_partitionUpsertMetadataManager != null) {
UpsertContext upsertContext = _partitionUpsertMetadataManager.getContext();
if (upsertContext.isAllowPartialUpsertConsumptionDuringCommit()
|| upsertContext.getUpsertMode() != UpsertConfig.Mode.PARTIAL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. good catch, but iirc, it's checking if the upsert mode is partial upsert previously... but anyway, this check looks right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just changed it to use context as source of truth

@@ -110,7 +109,8 @@ public void testValidateIfMergeRollupCanBeEnabledOrNot() {

tableConfig =
new TableConfigBuilder(TableType.REALTIME).setTableName(RAW_TABLE_NAME).setTimeColumnName(TIME_COLUMN_NAME)
.setDedupConfig(new DedupConfig(true, HashFunction.MD5)).build();
.setDedupConfig(new DedupConfig())
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The existing format is actually wrong. Didn't touch other part of the test

+ "TTL: {}, dedup time column: {}, table index dir: {}", getClass().getSimpleName(), _tableNameWithType,
primaryKeyColumns, hashFunction, _enablePreload, metadataTTL, dedupTimeColumn, tableIndexDir);

// NOTE: This field doesn't follow enablement override, and always take instance config if set to true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in comment: ... if set to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If instance config is set to true. Let me revise the comment lol

Preconditions.checkState(!upsertConfig.isEnablePreload(),
"enableDeletedKeysCompactionConsistency and enablePreload shouldn't exist together for upsert table");
// enableDeletedKeysCompactionConsistency shouldn't exist with preload enabled
Preconditions.checkState(upsertConfig.getPreload() != Enablement.ENABLE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check == Enablement.DISABLE here? as iiuc, if it's non-DISABLE, then it'd use server config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be ENABLE, but can be DEFAULT or DISABLE. After server override, we need it to be disabled



/** Class representing upsert configuration of a table. */
/// Class representing upsert configuration of a table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rm one /

Copy link
Contributor Author

@Jackie-Jiang Jackie-Jiang Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new javadoc format (3 / instead of 2) which is more concise


@JsonPropertyDescription("Function to hash the primary key.")
private HashFunction _hashFunction = HashFunction.NONE;

/// Maintains the mapping of merge strategies per column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, looks like all comments started with 3 / in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. 3 / means javadoc; 2 / is regular comment

@Jackie-Jiang Jackie-Jiang force-pushed the fix_upsert_dedup_config branch 2 times, most recently from 825b3e5 to e6dbecf Compare April 15, 2025 23:42
@Jackie-Jiang Jackie-Jiang force-pushed the fix_upsert_dedup_config branch from e6dbecf to 68e4a4b Compare April 15, 2025 23:46
@@ -44,12 +46,16 @@ public enum ConsistencyMode {
}

@JsonPropertyDescription("Upsert mode.")
private Mode _mode;
private Mode _mode = Mode.FULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backward compatible during rollout restart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We don't allow mode to be null, so this one just simplifies the setup for test.

@Jackie-Jiang Jackie-Jiang merged commit 45eb5de into apache:master Apr 16, 2025
22 checks passed
@Jackie-Jiang Jackie-Jiang deleted the fix_upsert_dedup_config branch April 16, 2025 20:09
leujean02 pushed a commit to leujean02/pinot that referenced this pull request Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) dedup changes related to realtime ingestion dedup handling documentation incompatible Indicate PR that introduces backward incompatibility release-notes Referenced by PRs that need attention when compiling the next release notes upsert
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants