Guarantee fault tolerance #45

cchudant · 2024-07-19T13:25:20Z

Hello :)

If the app exits gracefully (ctrl+c, panic), or even worse when the app exits ungracefully (server outage, process killed), the database may end up in an inconsistent state (it was stopped between two DB updates)

I think an example of that is if the app stops right here https://github.com/karnotxyz/madara-orchestrator/blob/main/crates/orchestrator/src/jobs/mod.rs#L105

Usually you use mongodb transactions to solve that, as they guarantee atomicity and will rollback if the connection drops.

apoorvsadana · 2024-07-19T18:17:28Z

Posting this from our chat

So the line you pointed out, if the code stops there, the DB state would be correct, we don’t want to revert the DB stage there because the job has been processed already (DA has been submitted or SHARP request has been submitted etc.). We don’t want to reprocess this because it would cost us more money. I understand normally we would want to do things in a txn so we can reprocess the entire thing if we break somewhere in the middle. However, the orchestrator interacts with a lot of external services which are expensive many times, so we want to avoid re processing where possible.

apoorvsadana added mvp mainnet labels Jul 19, 2024

apoorvsadana added this to Madara Orchestrator Jul 19, 2024

apoorvsadana moved this to Backlog in Madara Orchestrator Jul 19, 2024

apoorvsadana closed this as completed Jul 19, 2024

github-project-automation bot moved this from Backlog to Done in Madara Orchestrator Jul 19, 2024

apoorvsadana removed this from Madara Orchestrator Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guarantee fault tolerance #45

Guarantee fault tolerance #45

cchudant commented Jul 19, 2024

apoorvsadana commented Jul 19, 2024

Guarantee fault tolerance #45

Guarantee fault tolerance #45

Comments

cchudant commented Jul 19, 2024

apoorvsadana commented Jul 19, 2024