Skip to content

datastore: apply schema changes immediately to committed state. #2685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Centril
Copy link
Contributor

@Centril Centril commented Apr 30, 2025

Description of Changes

TODO

API and ABI breaking changes

None

Expected complexity level and risk

4, this is isolated to the datastore, but the datastore is critical, subtle, and contains various unsafe.

Testing

TODO

Benchmarks

Running the 10^6 updates on master (i7-7700K, 64GB RAM)

2025-05-05T11:17:04.425406Z  INFO: : Timing span "update_positions_by_collect": 284.095419ms
2025-05-05T11:17:05.083368Z  INFO: : Timing span "update_positions_by_collect": 295.386505ms
2025-05-05T11:17:05.735609Z  INFO: : Timing span "update_positions_by_collect": 292.30692ms
2025-05-05T11:17:06.385407Z  INFO: : Timing span "update_positions_by_collect": 290.945497ms
2025-05-05T11:17:06.977770Z  INFO: : Timing span "update_positions_by_collect": 292.184628ms
2025-05-05T11:17:08.591845Z  INFO: : Timing span "update_positions_by_collect": 290.017534ms
2025-05-05T11:17:09.190420Z  INFO: : Timing span "update_positions_by_collect": 292.735686ms
2025-05-05T11:17:09.844613Z  INFO: : Timing span "update_positions_by_collect": 293.123304ms
2025-05-05T11:17:10.436698Z  INFO: : Timing span "update_positions_by_collect": 293.518898ms
2025-05-05T11:17:11.031339Z  INFO: : Timing span "update_positions_by_collect": 291.461906ms
mean = 291.578

Running the 10^6 updates on centril/datastore-revamp-schema-changes

2025-05-05T11:29:12.227311Z  INFO: : Timing span "update_positions_by_collect": 241.681369ms
2025-05-05T11:29:12.916596Z  INFO: : Timing span "update_positions_by_collect": 270.112951ms
2025-05-05T11:29:13.546550Z  INFO: : Timing span "update_positions_by_collect": 266.506732ms
2025-05-05T11:29:14.174847Z  INFO: : Timing span "update_positions_by_collect": 268.173262ms
2025-05-05T11:29:14.806230Z  INFO: : Timing span "update_positions_by_collect": 267.065537ms
2025-05-05T11:29:16.151544Z  INFO: : Timing span "update_positions_by_collect": 266.592603ms
2025-05-05T11:29:16.784821Z  INFO: : Timing span "update_positions_by_collect": 268.69895ms
2025-05-05T11:29:17.357343Z  INFO: : Timing span "update_positions_by_collect": 267.566095ms
2025-05-05T11:29:17.966222Z  INFO: : Timing span "update_positions_by_collect": 265.485971ms
2025-05-05T11:29:18.599481Z  INFO: : Timing span "update_positions_by_collect": 271.200651ms
mean = 265.308 (26.27ms, -9%)

Running the subscriptions benchmark (baseline is master):

footprint-scan          time:   [55.630 ms 56.036 ms 56.608 ms]
                        change: [-1.0606% +0.1181% +1.2144%] (p = 0.86 > 0.05)
                        No change in performance detected.
footprint-semijoin      time:   [100.82 µs 101.18 µs 101.68 µs]
                        change: [-8.9543% -8.0530% -7.2691%] (p = 0.00 < 0.05)
index-scan-multi        time:   [539.58 ns 541.65 ns 544.45 ns]
                        change: [-1.2826% -0.3436% +0.6942%] (p = 0.50 > 0.05)
                        No change in performance detected.
full-scan               time:   [36.130 ms 36.314 ms 36.525 ms]
                        change: [+0.4731% +1.2023% +1.9643%] (p = 0.00 < 0.05)
                        Change within noise threshold.
full-join               time:   [170.08 µs 171.93 µs 174.28 µs]
                        change: [-5.9292% -4.8670% -3.8686%] (p = 0.00 < 0.05)
                        Performance has improved.
incr-select             time:   [134.09 ns 134.30 ns 134.64 ns]
                        change: [-3.8529% -3.3649% -2.8372%] (p = 0.00 < 0.05)
                        Performance has improved.
incr-join               time:   [646.79 ns 651.10 ns 656.92 ns]
                        change: [+1.4793% +2.3950% +3.1799%] (p = 0.00 < 0.05)
                        Performance has regressed.
query-indexes-multi     time:   [562.13 ns 564.42 ns 569.14 ns]
                        change: [-0.6362% -0.0276% +0.5452%] (p = 0.94 > 0.05)
                        No change in performance detected.

There is some noise in this benchmark, but -8% on footprint-semijoin at least shows up consistently and seems real.

@Centril Centril requested a review from Copilot April 30, 2025 15:54
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces immediate application of schema changes to the committed datastore state and updates various index and schema manipulation APIs. Key changes include the addition of new merging functions (can_merge) for several index types, revised schema mutation and index deletion APIs in Table, and significant restructuring of transaction state and rollback mechanisms in the core datastore layer.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/table/src/table_index/uniquemap.rs Added a can_merge method to verify if two UniqueMaps can be merged
crates/table/src/table_index/unique_direct_index.rs Added a can_merge method for unique direct indices using a zip iteration over inner collections
crates/table/src/table_index/unique_direct_fixed_cap_index.rs Added a can_merge function to similar effect for fixed capacity indices
crates/table/src/table_index/mod.rs Introduced a could_merge method for TableIndex that dispatches to the appropriate can_merge functions
crates/table/src/table.rs Modified schema mutation functions and index add/delete operations with updated pointer map handling
crates/schema/src/schema.rs Revised adjacent schema modifications using a new take_adjacent_schemas API and helper find_remove
crates/core/src/db/datastore/locking_tx_datastore/* Updated transaction state and rollback handling to immediately reflect schema changes, plus various related API adjustments
Comments suppressed due to low confidence (3)

crates/table/src/table_index/uniquemap.rs:87

  • Add unit tests for the new can_merge method to verify its behavior when merging maps that have conflicting keys as well as when no conflicts are present.
pub(crate) fn can_merge(&self, other: &UniqueMap<K, V>) -> Result<(), &V> {

crates/table/src/table_index/unique_direct_index.rs:210

  • Ensure that unit tests are added for can_merge to cover scenarios where inner slot conflicts occur and where merge should succeed without errors.
pub(crate) fn can_merge(&self, other: &UniqueDirectIndex) -> Result<(), RowPointer> {

crates/table/src/table_index/unique_direct_fixed_cap_index.rs:126

  • Introduce tests for the new can_merge function to confirm that the function correctly identifies merge conflicts when both slots are occupied.
pub(crate) fn can_merge(&self, other: &UniqueDirectFixedCapIndex) -> Result<(), RowPointer> {

@Centril Centril force-pushed the centril/datastore-revamp-schema-changes branch 2 times, most recently from a3ecb73 to 3661d73 Compare April 30, 2025 18:19
@CLAassistant

This comment was marked as spam.

@Centril Centril force-pushed the centril/datastore-revamp-schema-changes branch from 3661d73 to f6b29b7 Compare May 5, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants