This documents explains the processes and practices recommended for contributing enhancements or bug fixing to the Alertmanager Charmed Operator.
The intended use case of this operator is to be deployed as part of the COS Lite bundle, although that is not necessary.
A typical setup using snaps can be found in the Juju docs.
- Prior to getting started on a pull request, we first encourage you to open an issue explaining the use case or bug. This gives other contributors a chance to weigh in early in the process.
- To author PRs you should be familiar with juju and how operators are written.
- The best way to get a head start is to join the conversation on our Mattermost channel or Discourse.
- All enhancements require review before being merged. Besides the
code quality and test coverage, the review will also take into
account the resulting user experience for Juju administrators using
this charm. To be able to merge you would have to rebase
onto the
main
branch. We do this to avoid merge commits and to have a linear Git history. - We use
tox
to manage all virtualenvs for the development lifecycle.
Unit tests are written with the Operator Framework test harness and integration tests are written using pytest-operator and python-libjuju.
The default test environments - lint, static and unit - will run if you start
tox
without arguments.
You can also manually run a specific test environment:
tox -e fmt # update your code according to linting rules
tox -e lint # code style
tox -e static # static analysis
tox -e unit # unit tests
tox -e integration # integration tests
tox -e integration-lma # integration tests for the lma-light bundle
tox
creates a virtual environment for every tox environment defined in
tox.ini. To activate a tox environment for manual testing,
source .tox/unit/bin/activate
Alerts can be created using
amtool
,
amtool alert add alertname=oops service="my-service" severity=warning \
instance="oops.example.net" --annotation=summary="High latency is high!" \
--generator-url="http://prometheus.int.example.net"
or using Alertmanager's HTTP API, for example:
alertmanager_ip=$(juju status alertmanager/0 --format=json | \
jq -r ".applications.alertmanager.units.\"alertmanager/0\".address")
curl -XPOST http://$alertmanager_ip:9093/api/v1/alerts -d "[{
\"status\": \"firing\",
\"labels\": {
\"alertname\": \"$name\",
\"service\": \"my-service\",
\"severity\":\"warning\",
\"instance\": \"$name.example.net\"
},
\"annotations\": {
\"summary\": \"High latency is high!\"
},
\"generatorURL\": \"http://prometheus.int.example.net\"
}]"
The alert should then be listed,
curl http://$alertmanager_ip:9093/api/v1/alerts
and visible on a karma dashboard, if configured.
Relations between alertmanager and prometheus can be verified by querying prometheus for active alertmanagers:
curl -X GET "http://$prom_ip:9090/api/v1/alertmanagers"
Build the charm in this git repository using
charmcraft pack
which will create a *.charm
file you can deploy with:
juju deploy ./alertmanager-k8s.charm \
--resource alertmanager-image=ubuntu/prometheus-alertmanager \
--config config_file='@path/to/alertmanager.yml' \
--config templates_file='@path/to/templates.tmpl'
- The main charm class is
AlertmanagerCharm
, which responds to config changes (viaConfigChangedEvent
) and cluster changes (viaRelationJoinedEvent
,RelationChangedEvent
andRelationDepartedEvent
). - All lifecycle events call a common hook,
_common_exit_hook
after executing their own business logic. This pattern simplifies state tracking and improves consistency. - On startup, the charm waits for
PebbleReadyEvent
and for an IP address to become available before starting the karma service and declaringActiveStatus
. The charm must be related to an alertmanager instance, otherwise the charm will go into blocked state.
- The
alertmanager.yml
config file is created in its entirety by the charm code on startup (the defaultalertmanager.yml
is overwritten). This is done to maintain consistency across OCI images. - Hot reload via the alertmanager HTTP API is used whenever possible instead of service restart, to minimize downtime.