Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee fault tolerance #45

Closed
cchudant opened this issue Jul 19, 2024 · 1 comment
Closed

Guarantee fault tolerance #45

cchudant opened this issue Jul 19, 2024 · 1 comment

Comments

@cchudant
Copy link
Member

Hello :)

If the app exits gracefully (ctrl+c, panic), or even worse when the app exits ungracefully (server outage, process killed), the database may end up in an inconsistent state (it was stopped between two DB updates)

I think an example of that is if the app stops right here https://github.com/karnotxyz/madara-orchestrator/blob/main/crates/orchestrator/src/jobs/mod.rs#L105

Usually you use mongodb transactions to solve that, as they guarantee atomicity and will rollback if the connection drops.

@apoorvsadana
Copy link
Contributor

Posting this from our chat

So the line you pointed out, if the code stops there, the DB state would be correct, we don’t want to revert the DB stage there because the job has been processed already (DA has been submitted or SHARP request has been submitted etc.). We don’t want to reprocess this because it would cost us more money. I understand normally we would want to do things in a txn so we can reprocess the entire thing if we break somewhere in the middle. However, the orchestrator interacts with a lot of external services which are expensive many times, so we want to avoid re processing where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants