Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make backfill behavior more explicit #700

Merged
merged 21 commits into from
Feb 8, 2025
Merged

Make backfill behavior more explicit #700

merged 21 commits into from
Feb 8, 2025

Conversation

dermanyang
Copy link
Contributor

@dermanyang dermanyang commented Jan 29, 2025

Purpose

Make restart behavior more explicit through the introduction of TestingConfig, BootstrapConfig and ProcessorMode on IndexerProcessorConfig. The starting_version on transaction_stream_config should no longer be used, instead there are new fields initial_starting_version on the backfill + bootstrap configs, and override_starting_version on testing config.

Behavior

The current live behavior of taking the higher value of transaction_stream_config.starting_version is moved to bootstrap_config.initial_starting_version. This is the default behavior of the processor if ProcessorMode is not specified. Just as before, if the respective starting version is not provided, the processor will begin from genesis.

BackfillConfig.intial_starting_version will be respected when any of the following are true:

  • there is no checkpoint in backfill_processor_status
  • the last backfill has been complete (last_successful_version >= backfill_config.ending_version)
  • initial_starting_version > last_successful_version.
  • backfill_config.overwrite_checkpoint is set to true
    Otherwise, the processor will select last_successful_version.

Technical Bits

Using pattern matching on IndexerProcessorConfig.processor_mode for different logic in starting_version.rs. All new fields are Optional using Serde to default processor_mode to default when not provided. Serde will throw if mode == testing or mode == backfill and the respective sub-config isn't provided.

Make start and end version on backfill_config required. Remove backfill_alias field in favor of backfill_id. This will be appended onto the processor type as a PK in the backfill_processor_status table.

Testing

Added tests for new behavior

Mode not set, no checkpoint

image

Mode not set, checkpoint higher than bootstrap start ver (ran after the above run)

image

mode = backfill with no checkpoint

image

mode = backfill, checkpoint higher than backfill_config.intial_starting_version

image

mode = testing, checkpoint higher than testing_config.override_starting_version. latter is still picked

image
image

@dermanyang dermanyang changed the title Sdy/improve backfill Make backfill behavior more explicit Jan 30, 2025
@dermanyang dermanyang marked this pull request as ready for review January 30, 2025 02:13
@dermanyang dermanyang requested review from yuunlimm and rtso January 30, 2025 02:14
}

fn log_ascii_warning(version: u64) {
println!(r#"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was fun haha

/// If it is a regular processor, this will return the higher of the checkpointed version,
/// or `staring_version` from the config, or 0 if not set.
/// If it is a regular (`mode == default`) processor, this will return the higher of the (checkpointed version,
/// `initial_starting_version` || 0) from the bootstrap config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: what purpose does intial_starting_version serve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the same purpose as it is today, i just prepended it with initial to make it explicit that it'll only be respected on initial startup and get_starting_version() will take the larger of the config's starting version and the checkpointed version

Copy link
Collaborator

@bowenyang007 bowenyang007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the updated readme?

had some questions. once confirmed should be g2g. thanks for addressing this!

Copy link
Collaborator

@rtso rtso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments but devex lgtm! would you mind also updating the README.md?

- `backfill_id`: appended to `processor_type` for a unique backfill identifier
- `initial_starting_version`: processor starts here unless there is a greater checkpointed version
- `ending_version`: ending version of the backfill
- `overwrite_checkpoint`: overwrite checkpoints if it exists, restarting the backfill from `initial_starting_version`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this lead to crashlooping behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we either have crash looping when someone sets overwrite_checkpoint = true, or we don't have a way to in progress backfills. I think if someone needs to restart an in-progress BF, they're able to pay attention and fix the wrong behavior.

What do you think?

/// the checkpointed version + 1.
///
/// If this is a backfill processor and there is not an in-progress backfill (i.e., no checkpoint or
/// backfill status is COMPLETE), this will return `starting_version` from the config, or 0 if not set.
/// backfill status is COMPLETE), this will return `intial_starting_version` from the backfill config, or 0 if not set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return intial_starting_version from the backfill config, or 0 if not set. isn't intial_starting_version field required?

Copy link
Contributor

@yuunlimm yuunlimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left minor questions, overall lgtm!

Comment on lines 26 to 35
backfill_config:
backfill_id: "123"
initial_starting_version: 42
ending_version: 100042
overwrite_checkpoint: false
testing_config:
override_starting_version: 111111
ending_version: 222222
bootstrap_config:
initial_starting_version: 123456789
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are only 1 of these configs set at a time? if so, might be confusing to include all three of these in the template. could we also add comments here for when to use each mode?

Copy link
Contributor Author

@dermanyang dermanyang Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to not include these in the example config but mention them in the explanation section:

  • backfill_config (optional)

    • backfill_id: appended to processor_type for a unique backfill identifier
    • initial_starting_version: processor starts here unless there is a greater checkpointed version
    • ending_version: ending version of the backfill
    • overwrite_checkpoint: overwrite checkpoints if it exists, restarting the backfill from initial_starting_version.
  • testing_config (optional)

    • override_starting_version: starting version of the testing. always starts from this version
    • ending_version: ending version of the testing
  • bootstrap_config (optional) used for regular, non-backfill, processors

    • initial_starting_version: processor starts here unless there is a greater checkpointed version.
  • mode: (optional) default, testing or backfill. Set to default if no mode specified. If backfill/testing/bootstrap
    configs are not specified, processor will start from 0 or the last successfully processed version.

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

Attention: Patch coverage is 59.90338% with 249 lines in your changes missing coverage. Please review.

Project coverage is 50.4%. Comparing base (e943d0f) to head (c6c5ad8).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...k-processor/src/config/indexer_processor_config.rs 0.0% 31 Missing ⚠️
...ocessor/src/steps/common/processor_status_saver.rs 26.9% 19 Missing ⚠️
...ocessors/parquet_account_transactions_processor.rs 0.0% 12 Missing ⚠️
...or/src/parquet_processors/parquet_ans_processor.rs 0.0% 12 Missing ⚠️
...rc/parquet_processors/parquet_default_processor.rs 0.0% 12 Missing ⚠️
...src/parquet_processors/parquet_events_processor.rs 0.0% 12 Missing ⚠️
...uet_processors/parquet_fungible_asset_processor.rs 0.0% 12 Missing ⚠️
...rc/parquet_processors/parquet_objects_processor.rs 0.0% 12 Missing ⚠️
.../src/parquet_processors/parquet_stake_processor.rs 0.0% 12 Missing ⚠️
...c/parquet_processors/parquet_token_v2_processor.rs 0.0% 12 Missing ⚠️
... and 14 more
Additional details and impacted files
@@          Coverage Diff           @@
##            main    #700    +/-   ##
======================================
  Coverage   50.3%   50.4%            
======================================
  Files        253     253            
  Lines      28643   29128   +485     
======================================
+ Hits       14432   14699   +267     
- Misses     14211   14429   +218     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dermanyang dermanyang merged commit bd642dd into main Feb 8, 2025
11 checks passed
@dermanyang dermanyang deleted the sdy/improve_backfill branch February 8, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants