Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sequencer-relayer): poll Celestia App for transaction status #1940

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ethanoroshiba
Copy link
Contributor

@ethanoroshiba ethanoroshiba commented Jan 31, 2025

Summary

Changes polling method for confirming Celestia commitment from GetTx gRPC to TxStatus.

Background

Previously, GetTx was polled indefinitely. On the client end of this RPC, we'd have no indication of what's happening on Celestia, so we had no way to know if a transaction was, for instance, evicted from the mempool, which would cause a halt of data submission. The TxStatus endpoint solves this problem by providing a more detailed status of the transaction.

Changes

  • Poll TxStatus instead of GetTx. The client will behave in the following ways:
    • Status is PENDING: continue polling
    • Status is UNKNOWN: continue polling until MAX_WAIT_TIME_UNKNOWN is reached (currently 6s)
    • Status is COMMITTED: return block height
    • Status is EVICTED: return an error

Testing

  • Adjusted mounts as necessary.
  • Added blackbox tests for each possible status response.

Changelogs

Changelog updated.

Metrics

  • Added unknown status transaction count.
  • Added evicted transaction count.

Related Issues

closes #1936

@github-actions github-actions bot added sequencer-relayer pertaining to the astria-sequencer-relayer crate cd labels Jan 31, 2025
@github-actions github-actions bot added the proto pertaining to the Astria Protobuf spec label Feb 3, 2025
@ethanoroshiba ethanoroshiba changed the title fix(sequencer-relayer)!: use 'tx_status' instead of 'GetTx' fix(sequencer-relayer): use 'tx_status' instead of 'GetTx' Feb 3, 2025
@ethanoroshiba ethanoroshiba removed the cd label Feb 3, 2025
@ethanoroshiba ethanoroshiba marked this pull request as ready for review February 3, 2025 19:45
@ethanoroshiba ethanoroshiba changed the title fix(sequencer-relayer): use 'tx_status' instead of 'GetTx' fix(sequencer-relayer): use 'TxStatus' instead of 'GetTx' Feb 4, 2025
@ethanoroshiba ethanoroshiba changed the title fix(sequencer-relayer): use 'TxStatus' instead of 'GetTx' fix(sequencer-relayer): poll Celestia App for transaction status Feb 5, 2025

sequencer_relayer
.timeout_ms(
11_000,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an abhorrently long time for a test to take, but we need to test that the relayer correctly exits if the allotted time passes without the status changing from "UNKNOWN". We cannot advance time manually due to the multithreaded context, so I'm not sure of a way around this :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could perhaps change the actual timeout based on the test feature:

#[cfg(test)]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(2);
#[cfg(not(test))]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(10);

Alternatively, 10 seconds might be too long anyway given that Celestia block time is now 6 seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4f304bb I lowered the time to 6s and the minimum logging time to 3. I'm not sure how I feel about building the testing functionality into the source code, but I'll defer to you here on what you think is best, since I think I lack the context of what is best practice. I feel like generally, though, it's best to keep testing logic (especially for blackbox tests) separate

Copy link
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looking good. Just a few comments.

Comment on lines 395 to 402
// The min seconds to sleep after receiving a TxStatus response and sending the next
// request.
const POLL_INTERVAL_SECS: u64 = 1;
// The minimum duration between logs.
const LOG_INTERVAL: Duration = Duration::from_secs(5);
// The maximum amount of time to wait for a transaction to be committed if its status is
// `UNKNOWN`.
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering about the rationale for changing the retry backoff and removing the START_LOGGING_DELAY? Seems like that behaviour is still desirable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I thought because this functionality was moved to the debug level, it could be helpful to have the full picture of each response the relayer has received. I don't have a strong opinion on this, though, so I'd be happy to add that back if you think it's better


sequencer_relayer
.timeout_ms(
11_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could perhaps change the actual timeout based on the test feature:

#[cfg(test)]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(2);
#[cfg(not(test))]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(10);

Alternatively, 10 seconds might be too long anyway given that Celestia block time is now 6 seconds?

ethanoroshiba and others added 2 commits February 18, 2025 10:14
Co-authored-by: Fraser Hutchison <190532+Fraser999@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
override-freeze proto pertaining to the Astria Protobuf spec sequencer-relayer pertaining to the astria-sequencer-relayer crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relayer gets stuck in infinite loop when transaction is evicted from Celestia mempool
2 participants