-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise errors on unrecoverable problems #110
Conversation
For: - snap install or refresh failure - snap not installed - snap service not active These cases are unexpected, the charm cannot recover automatically, and the user must manually debug the issue. Therefore, error status is more appropriate. Partially fixes: #108
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change, since it make more sense than block status.
Please fix lint a and unit tests. Can we also add a simple integration test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we insist on raising error, I recommend we rework the _on_collect_unit_status
Thanks for reviews everyone - I see this is early stage still, I probably could have left it as a draft. |
Ok lints and unit tests fixed, ready for review again. A downside I see of raising errors instead of using a custom blocked status, is that the juju status message is not as useful:
I'm not sure if we can improve that while still following this pattern of raising the error and letting the ops framework manage the error status. |
Actually I think this needs more work on the conditions to check if the exporter is running. I just observed this:
I guess this is because configure is not called for credentials-relation-joined hook, so at the end of the credentials-relation-joined hook, collect_status is run, and at this point the charm has the keystone data, but the service hasn't been started yet. The service will start in the next credentials-relation-configure hook. |
I updated it to avoid false positives with raising the error:
|
Not sure what's going on with the integration tests. :/ |
Integration tests are failing because the upgrade hook is crashing on collect status It's making too many false positives for error state here. If it was blocked, then this would result in a transient blocked state while the exporter wasn't running during an upgrade. |
Raise errors for snap install or refresh failure. Raise an error because it's a fatal failure, and juju will auto-retry the hook, which may succeed on retry if the error was transient.
Improve blocked status message for if it detects snap not installed or service not active in the status hook.
I think Blocked is better than Error status here:
Partially fixes: #108