Skip to content

[HUDI-9158] Add storage-based lock provider abstract implementation #13103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 11, 2025

Conversation

alexr17
Copy link
Contributor

@alexr17 alexr17 commented Apr 8, 2025

Change Logs

Adds the abstract implementation of the storage-based lock provider using conditional writes.

See RFC: #12927

Previous PRs: #12958, #12954

Impact

Adds abstract lock provider implementation

Risk level (write none, low medium or high below)

None, without StorageLock implementation, this class will not be used.

Documentation Update

Config updates added here: #13104

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Apr 8, 2025
@alexr17
Copy link
Contributor Author

alexr17 commented Apr 8, 2025

@hudi-bot run azure

lockFilePath,
Thread.currentThread(),
ACQUIRING);
Thread.sleep(DEFAULT_LOCK_ACQUISITION_BUFFER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LockManager#lock already has retry logic when calling LockProvider#tryLock. Should the retry logic here be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are supposed to tryLock for the appropriate amount of time. I think the implementations in Zookeeper and Dynamo follow the same pattern here

Copy link
Contributor

@yihua yihua Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are supposed to tryLock for the appropriate amount of time

This makes sense, but it does not mean that we should retry at this layer.

@alexr17
Copy link
Contributor Author

alexr17 commented Apr 9, 2025

@hudi-bot run azure

1 similar comment
@alexr17
Copy link
Contributor Author

alexr17 commented Apr 9, 2025

@hudi-bot run azure

@@ -125,13 +119,12 @@ private synchronized void setLock(StorageLockFile lockObj) {
* ConditionalWriteLockConfig
* @param conf Storage config, ignored.
*/
public ConditionalWriteLockProvider(final LockConfiguration lockConfiguration, final StorageConfiguration<?> conf) {
ConditionalWriteLockConfig config = new ConditionalWriteLockConfig.Builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIx the doc too for the renaming.

@@ -276,9 +264,9 @@ private synchronized boolean isLockStillValid(StorageLockFile lock) {
*/
@Override
public synchronized boolean tryLock() {
assertHeartBeatManagerExists();
assertHeartbeatManagerExists();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we just use ValidationUtils.checkState instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with what this does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this could work, but I think we want to throw HoodieLockException, so not sure if IllegalStateException makes sense.

// It does not make sense to have heartbeat alive extending the lock lease while
// here we are trying
// to expire the lock.
if (heartbeatManager.hasActiveHeartbeat()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do remember that shutdown hook can get called even in the event of graceful shutdown.
So, these methods should be idempotent when called again.

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked on my final review

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits. LGTM.

alexr17 and others added 16 commits April 10, 2025 23:14
…lient/transaction/lock/ConditionalWriteLockConfig.java

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…lient/transaction/lock/ConditionalWriteLockProvider.java

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…lient/transaction/lock/ConditionalWriteLockProvider.java

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…lient/transaction/lock/ConditionalWriteLockProvider.java

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
@yihua yihua force-pushed the HUDI-9158/base-lp branch from 0a1e4ac to 2d596de Compare April 11, 2025 06:24
when(mockLockService.tryCreateOrUpdateLockFileWithRetry(any(), eq(nearExpiredLockFile), anyLong()))
.thenReturn(Pair.of(LockUpdateResult.ACQUIRED_BY_OTHERS, null));
assertFalse(lockProvider.renewLock());
verify(mockLogger).error("Owner {}: Unable to renew lock as it is acquired by others.", this.ownerId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: checking logger is not the best way to validate the logic. If it's not easy to validate without this way, we should think about improving the interface in the future for testing. HUDI-9307 to track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, it's just an additional check. The assert false above is sufficient for testing this code path.

I think we have a lot of different logger statements for the same return value so I actually think this is one scenario where validating this way makes sense.

// If the next heartbeat run identifies our lock has expired we will error out.
logger.warn("Owner {}: Unable to renew lock due to unknown error, could be transient.", ownerId);
// Let heartbeat retry later.
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return false based on the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should be updated. We should always retry renewal unless we receive a known fatal error code.

@yihua
Copy link
Contributor

yihua commented Apr 11, 2025

@alexr17 please update the RFC based on the latest code, including the configs, retries, shutdown hooks, etc., so that the RFC is consistent with the implementation.

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Please address remaining nits in a follow-up PR so we can land this PR to unblock stacked PRs on top of it.

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 3dc2df5 into apache:master Apr 11, 2025
60 checks passed
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 15, 2025
…pache#13103)

* Add initial changes for storage-based lock provider abstract implementation

---------

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
(cherry picked from commit 3dc2df5)
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 16, 2025
…pache#13103)

* Add initial changes for storage-based lock provider abstract implementation

---------

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
(cherry picked from commit 3dc2df5)
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 16, 2025
…pache#13103)

* Add initial changes for storage-based lock provider abstract implementation

---------

Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
(cherry picked from commit 3dc2df5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-1.0.2 size:XL PR with lines of changes > 1000
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants