-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry if call function on backend fails #242
Conversation
Reviewer's Guide by SourceryThe PR implements a retry mechanism for backend function calls to handle transient network errors during long-running operations. The implementation adds a retry loop with a 50ms delay between attempts, making up to 3 tries before failing. Sequence diagram for retry mechanism in backend function callssequenceDiagram
participant Client
participant Backend
Client->>Backend: call_function_on_backend()
loop Retry up to 3 times
alt Function succeeds
Backend-->>Client: Return result
else Function fails
Backend-->>Client: Raise error
Client->>Client: Wait 50ms
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hagenw - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider being more selective about which exceptions trigger a retry - catching all exceptions might mask programming errors that should fail immediately
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review
In discussion we have talked about the fact that we have simplistic requirements concerning retrying behavior. So for now we do not have to deal with exponential backoff and the like, but rather retry a thre times in an iteration.
Also, it is currently a deliberate action to have blocking behavior as seen in time.sleep(0.05)
.
Further, the argument to sleep which is hardcoded to 5ms does not have to be turned into a variable, at least not yet.
Therefore approval is given straight.
When publishing large datasets that take longer than 24 hours, I encountered network related errors for both Artifactory and S3 servers. They seem to be related with loosing the connection while uploading a file, which will then raise an error, and stops the whole process. To avoid this, this pull request proposes to not directly fail, but try two times more to rerunning the requested job on the backend, with a short delay of 50 ms between the tries.
I tested it on a 27 hour job on S3, which failed three times before, and did now proceed without any error for two times.
As I don't see any unwanted consequences, I would propose to integrate the changes.
Summary by Sourcery
Enhancements:
Summary by Sourcery
Enhancements: