Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent PM bug #7

Open
Jonyleo opened this issue Oct 22, 2024 · 1 comment
Open

Concurrent PM bug #7

Jonyleo opened this issue Oct 22, 2024 · 1 comment

Comments

@Jonyleo
Copy link

Jonyleo commented Oct 22, 2024

While developing a concurrency PM bug detecting tool, we have detected 2 bugs in APEX.

APEX's get operations execute in a lock-free manner, and as far as we are aware, no other mechanism exists to guarantee data retrieved by a get operations is explicitly persisted (via flush fence).

To aid the developers, we provide one of the stack traces for the PM accesses involved in this bug.

using AlexDataNode = alex::AlexDataNode<unsigned long, unsigned long, alex::AlexCompare, my_alloc::allocator<std::pair<unsigned long, unsigned long> >, true>;
using Apex = alex::Apex<unsigned long, unsigned long, alex::AlexCompare, my_alloc::allocator<std::pair<unsigned long, unsigned long> >, true>;

First bug - Update/Insert value races with Search

In this bug, the payload is written to PM, and is immediately visible to other threads, but not persisted.
Even though these operations are protected by a mutex, the Search operation is not, and a race arises from that scenario.
We recommend either protecting the Search operation with a lock, or ensuring data is persisted before (or simultaneously) being visible to other threads. This can be achieved via non-temporal stores for example. Although care should be taken to ensure atomicity in the case of insertion since the size of the KV pair is greater than 8 bytes.

PM stores:

AlexDataNode (/root/apex/src/benchmark/../core/apex_nodes.h:3798)
Apex::update_unsafe (/root/apex/src/benchmark/../core/apex.h:2353)
Apex::update (/root/apex/src/benchmark/../core/apex.h:2342)

AlexDataNode::insert_element_at (/root/apex/src/benchmark/../core/apex_nodes.h:3479)
AlexDataNode::insert (/root/apex/src/benchmark/../core/apex_nodes.h:3262)
Apex::insert_unsafe (/root/apex/src/benchmark/../core/apex.h:1446)
Apex::insert (/root/apex/src/benchmark/../core/apex.h:1417)

PM loads:

AlexDataNode::linear_probing_search_exact (/root/apex/src/benchmark/../core/apex_nodes.h:2933)
AlexDataNode::find_payload (/root/apex/src/benchmark/../core/apex_nodes.h:2217)
Apex::search_unsafe (/root/apex/src/benchmark/../core/apex.h:1380)
Apex::search (/root/apex/src/benchmark/../core/apex.h:1370)

AlexDataNode::linear_probing_search_exact (/root/apex/src/benchmark/../core/apex_nodes.h:2915)
AlexDataNode::find_payload (/root/apex/src/benchmark/../core/apex_nodes.h:2217)
Apex::search_unsafe (/root/apex/src/benchmark/../core/apex.h:1380)
Apex::search (/root/apex/src/benchmark/../core/apex.h:1370)

Second bug - Insert/Erase key races with Search

Similarly to the first bug, but for the key, instead of the value in the KV store.
In this case, the side effects, are different, APEX may return a KV pair that has been inserted, however, after a crash, the pair is no longer present.
On the other hand, APEX may fail to retrieve an erased KV pair before a crash, while it is available after the crash.
This breaks crash-consistency.

PM stores:

AlexDataNode::insert_element_at (/root/apex/src/benchmark/../core/apex_nodes.h:3480)
AlexDataNode::insert (/root/apex/src/benchmark/../core/apex_nodes.h:3262)
Apex::insert_unsafe (/root/apex/src/benchmark/../core/apex.h:1446)
Apex::insert (/root/apex/src/benchmark/../core/apex.h:1417)

AlexDataNode::erase (/root/apex/src/benchmark/../core/apex_nodes.h:3606)
Apex::erase_unsafe (/root/apex/src/benchmark/../core/apex.h:2313)
Apex::erase (/root/apex/src/benchmark/../core/apex.h:2302)

PM load:

alex::AlexCompare::operator()<unsigned long, unsigned long> (/root/apex/src/benchmark/../core/apex_base.h:296)
AlexDataNode::linear_probing_search_exact (/root/apex/src/benchmark/../core/apex_nodes.h:962)
AlexDataNode::find_payload (/root/apex/src/benchmark/../core/apex_nodes.h:2217)
Apex::search_unsafe (/root/apex/src/benchmark/../core/apex.h:1380)
Apex::search (/root/apex/src/benchmark/../core/apex.h:1370)

@Jonyleo
Copy link
Author

Jonyleo commented Jan 29, 2025

Pinging this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant