-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from mkilchhofer/init
- Loading branch information
Showing
21 changed files
with
2,684 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# EditorConfig is awesome: http://EditorConfig.org | ||
# Uses editorconfig to maintain consistent coding styles | ||
|
||
# top-most EditorConfig file | ||
root = true | ||
|
||
# Unix-style newlines with a newline ending every file | ||
[*] | ||
charset = utf-8 | ||
end_of_line = lf | ||
indent_size = 2 | ||
indent_style = space | ||
insert_final_newline = true | ||
max_line_length = 120 | ||
trim_trailing_whitespace = true | ||
|
||
[{go.mod,go.sum,*.go}] | ||
indent_style = tab | ||
indent_size = 4 | ||
|
||
[*.{tf,tfvars}] | ||
indent_size = 2 | ||
indent_style = space | ||
|
||
[*.md] | ||
max_line_length = 0 | ||
trim_trailing_whitespace = false | ||
|
||
[COMMIT_EDITMSG] | ||
max_line_length = 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
name: Terratest | ||
on: pull_request | ||
|
||
permissions: {} | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup Go | ||
uses: actions/setup-go@v5 | ||
with: | ||
go-version: '1.22.x' | ||
|
||
- name: Install dependencies | ||
run: | | ||
pwd | ||
cd test | ||
go get . | ||
- name: Test with the Go CLI | ||
run: | | ||
pwd | ||
cd test | ||
go test -v | ||
- name: Check for updated README (terraform-docs) | ||
uses: terraform-docs/gh-actions@v1.2.0 | ||
with: | ||
working-dir: . | ||
fail-on-diff: "true" | ||
config-file: ".terraform-docs.yml" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
formatter: "markdown" | ||
|
||
output: | ||
file: "README.md" | ||
|
||
settings: | ||
anchor: false | ||
indent: 3 | ||
|
||
sections: | ||
show: | ||
- providers | ||
- inputs | ||
- outputs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,73 @@ | ||
# terraform-grafana-prometheus-alerts | ||
Terraform module to convert Prometheus alert rules to Grafana alerts | ||
|
||
Terraform module to convert [Prometheus Alerting rules] to [Grafana-managed alert rules] | ||
|
||
## Motivation / Why using this module | ||
|
||
There are plenty of apps (mostly out of CNCF's ecosystem) where the vendor or the community provides monitoring dashboards | ||
and alerts. Dashboards are normally provided as a JSON file which can be loaded into Grafana. Alerts are mostly provided | ||
as [Prometheus Alerting rules]. | ||
|
||
There are users who already operate a Grafana instance or use a managed Grafana instance from a cloud provider (Grafana | ||
Cloud, Amazon Managed Grafana, Azure Managed Grafana, etc.). Why not using this Grafana instance for the | ||
alerting? | ||
|
||
The problem is that Grafana's unified alerting uses another format for the alert definition but the concept with labels, | ||
annotations (provide description and runbook URLs) is almost identical. | ||
This module allows you to reuse the [Prometheus Alerting rules] and configure them inside Grafana. | ||
|
||
## Example usage | ||
|
||
```hcl | ||
module "cert_manager_rules" { | ||
source = "github.com/mkilchhofer/terraform-grafana-prometheus-alerts" | ||
prometheus_alerts_file_path = file("/path/to/alerts/cert-manager.yaml") | ||
folder_uid = grafana_folder.test.uid | ||
datasource_uid = grafana_data_source.prometheus.uid | ||
} | ||
``` | ||
|
||
## Requirements | ||
|
||
- Grafana 8.0+ (Unified alerting) | ||
|
||
## Limitations | ||
|
||
- Defining multiple alerts with the same name is not supported in Grafana | ||
|
||
## Overriding definitions of Prometheus Alerting file | ||
|
||
TODO | ||
|
||
## TF module documentation | ||
|
||
<!-- BEGIN_TF_DOCS --> | ||
### Providers | ||
|
||
| Name | Version | | ||
|------|---------| | ||
| grafana | ~> 3.2 | | ||
|
||
### Inputs | ||
|
||
| Name | Description | Type | Default | Required | | ||
|------|-------------|------|---------|:--------:| | ||
| datasource\_uid | The UID of the Grafana datasource being queried with the expressions inside the Alerting rule file | `string` | n/a | yes | | ||
| default\_evaluation\_interval\_duration | How often is the rule evaluated by default. (When not defined inside your Alerting rules file) | `string` | `"5m"` | no | | ||
| disable\_provenance | Allow modifying the rule group from other sources than Terraform or the Grafana API. | `bool` | `false` | no | | ||
| folder\_uid | The UID of the Grafana folder that the alerts belongs to. | `string` | n/a | yes | | ||
| org\_id | The Organization ID of of the Grafana Alerting rule groups. (Only supported with basic auth, API keys are already org-scoped) | `string` | `null` | no | | ||
| overrides | Overrides per Alert rule | <pre>map(object({<br> alert_threshold = optional(number)<br> exec_err_state = optional(string)<br> is_paused = optional(bool)<br> no_data_state = optional(string)<br> labels = optional(map(string))<br> }))</pre> | `{}` | no | | ||
| prometheus\_alerts\_file\_path | Path to the Prometheus Alerting rules file | `string` | n/a | yes | | ||
|
||
### Outputs | ||
|
||
| Name | Description | | ||
|------|-------------| | ||
| alertsfile\_map | n/a | | ||
| file\_as\_yaml | n/a | | ||
<!-- END_TF_DOCS --> | ||
|
||
[Grafana-managed alert rules]: https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rules/#grafana-managed-alert-rules | ||
[Prometheus Alerting rules]: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
resource "grafana_rule_group" "this" { | ||
# for_each = local.file_as_yaml.groups | ||
for_each = local.alertsfile_map | ||
|
||
name = each.value.name | ||
folder_uid = var.folder_uid | ||
org_id = var.org_id | ||
|
||
# There is no function supporting Golang's "duration" (format of interval within an alert group) | ||
# Use timeadd() function which supports it. | ||
interval_seconds = ( | ||
(parseint(formatdate("s", timeadd("1970-01-01T00:00:00Z", try(each.value.interval, var.default_evaluation_interval_duration))), 10) * 1) + | ||
(parseint(formatdate("m", timeadd("1970-01-01T00:00:00Z", try(each.value.interval, var.default_evaluation_interval_duration))), 10) * 60) + | ||
(parseint(formatdate("h", timeadd("1970-01-01T00:00:00Z", try(each.value.interval, var.default_evaluation_interval_duration))), 10) * 3600) | ||
) | ||
|
||
disable_provenance = var.disable_provenance | ||
|
||
dynamic "rule" { | ||
for_each = {for rule in each.value.rules: rule.alert => rule} | ||
|
||
content { | ||
name = rule.value.alert | ||
for = try(rule.value.for, null) | ||
condition = "ALERTCONDITION" | ||
|
||
annotations = {for k, v in rule.value.annotations : k => replace(v, "$value", "$values.QUERY_RESULT.Value")} | ||
labels = merge(rule.value.labels, try(var.overrides[rule.value.alert].labels, {})) | ||
|
||
exec_err_state = try(var.overrides[rule.value.alert].exec_err_state, null) | ||
is_paused = try(var.overrides[rule.value.alert].is_paused, null) | ||
no_data_state = try(var.overrides[rule.value.alert].no_data_state, null) | ||
|
||
data { | ||
ref_id = "QUERY" | ||
relative_time_range { | ||
from = 600 | ||
to = 0 | ||
} | ||
datasource_uid = var.datasource_uid | ||
model = jsonencode({ | ||
editorMode = "code" | ||
expr = rule.value.expr | ||
intervalMs = 1000 | ||
maxDataPoints = 43200 | ||
refId = "QUERY" | ||
}) | ||
} | ||
|
||
## Reduce | ||
data { | ||
ref_id = "QUERY_RESULT" | ||
relative_time_range { | ||
from = 600 | ||
to = 0 | ||
} | ||
datasource_uid = "__expr__" | ||
model = jsonencode({ | ||
"conditions" = [ | ||
{ | ||
"evaluator" = { | ||
"params" = [0] | ||
"type" = "gt" | ||
} | ||
"operator" = { | ||
"type" = "and" | ||
} | ||
"query" = { | ||
"params" = [] | ||
} | ||
"reducer" = { | ||
"params" = [] | ||
"type" = "avg" | ||
} | ||
"type" = "query" | ||
}, | ||
] | ||
"datasource" = { | ||
"name" = "Expression" | ||
"type" = "__expr__" | ||
"uid" = "__expr__" | ||
} | ||
"expression" = "QUERY" | ||
"intervalMs" = 1000 | ||
"maxDataPoints" = 43200 | ||
"reducer" = "last" | ||
"refId" = "QUERY_RESULT" | ||
"type" = "reduce" | ||
}) | ||
} | ||
|
||
## Threshold | ||
data { | ||
ref_id = "ALERTCONDITION" | ||
relative_time_range { | ||
from = 600 | ||
to = 0 | ||
} | ||
datasource_uid = "__expr__" | ||
model = jsonencode({ | ||
"conditions" = [ | ||
{ | ||
"evaluator" = { | ||
"params" = [try(var.overrides[rule.value.alert].alert_threshold, 0)] | ||
"type" = "gt" | ||
} | ||
"operator" = { | ||
"type" = "and" | ||
} | ||
"query" = { | ||
"params" = ["QUERY_RESULT"] | ||
} | ||
"reducer" = { | ||
"params" = [] | ||
"type" = "last" | ||
} | ||
"type" = "query" | ||
}, | ||
] | ||
"datasource" = { | ||
"type" = "__expr__" | ||
"uid" = "__expr__" | ||
} | ||
"expression" = "QUERY_RESULT" | ||
"hide" = false | ||
"intervalMs" = 1000 | ||
"maxDataPoints" = 43200 | ||
"refId" = "ALERTCONDITION" | ||
"type" = "threshold" | ||
}) | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
locals { | ||
file_as_yaml = yamldecode(var.prometheus_alerts_file_path) | ||
alertsfile_map = {for group in local.file_as_yaml.groups: group.name => group} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
output "file_as_yaml" { | ||
value = local.file_as_yaml | ||
} | ||
|
||
output "alertsfile_map" { | ||
value = local.alertsfile_map | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Source: https://github.com/monitoring-mixins/website/blob/master/assets/cert-manager/alerts.yaml | ||
groups: | ||
- name: cert-manager | ||
rules: | ||
- alert: CertManagerAbsent | ||
annotations: | ||
description: New certificates will not be able to be minted, and existing ones | ||
can't be renewed until cert-manager is back. | ||
runbook_url: https://github.com/imusmanmalik/cert-manager-mixin/blob/main/RUNBOOK.md#certmanagerabsent | ||
summary: Cert Manager has disappeared from Prometheus service discovery. | ||
expr: absent(up{job="cert-manager"}) | ||
for: 10m | ||
labels: | ||
severity: critical | ||
- name: certificates | ||
rules: | ||
- alert: CertManagerCertExpirySoon | ||
annotations: | ||
dashboard_url: https://grafana.example.com/d/TvuRo2iMk/cert-manager | ||
description: The domain that this cert covers will be unavailable after {{ $value | ||
| humanizeDuration }}. Clients using endpoints that this cert protects will | ||
start to fail in {{ $value | humanizeDuration }}. | ||
runbook_url: https://github.com/imusmanmalik/cert-manager-mixin/blob/main/RUNBOOK.md#certmanagercertexpirysoon | ||
summary: The cert `{{ $labels.name }}` is {{ $value | humanizeDuration }} from | ||
expiry, it should have renewed over a week ago. | ||
expr: | | ||
avg by (exported_namespace, namespace, name) ( | ||
certmanager_certificate_expiration_timestamp_seconds - time() | ||
) < (21 * 24 * 3600) # 21 days in seconds | ||
for: 1h | ||
labels: | ||
severity: warning | ||
- alert: CertManagerCertNotReady | ||
annotations: | ||
dashboard_url: https://grafana.example.com/d/TvuRo2iMk/cert-manager | ||
description: This certificate has not been ready to serve traffic for at least | ||
10m. If the cert is being renewed or there is another valid cert, the ingress | ||
controller _may_ be able to serve that instead. | ||
runbook_url: https://github.com/imusmanmalik/cert-manager-mixin/blob/main/RUNBOOK.md#certmanagercertnotready | ||
summary: The cert `{{ $labels.name }}` is not ready to serve traffic. | ||
expr: | | ||
max by (name, exported_namespace, namespace, condition) ( | ||
certmanager_certificate_ready_status{condition!="True"} == 1 | ||
) | ||
for: 10m | ||
labels: | ||
severity: critical | ||
- alert: CertManagerHittingRateLimits | ||
annotations: | ||
dashboard_url: https://grafana.example.com/d/TvuRo2iMk/cert-manager | ||
description: Depending on the rate limit, cert-manager may be unable to generate | ||
certificates for up to a week. | ||
runbook_url: https://github.com/imusmanmalik/cert-manager-mixin/blob/main/RUNBOOK.md#certmanagerhittingratelimits | ||
summary: Cert manager hitting LetsEncrypt rate limits. | ||
expr: | | ||
sum by (host) ( | ||
rate(certmanager_http_acme_client_request_count{status="429"}[5m]) | ||
) > 0 | ||
for: 5m | ||
labels: | ||
severity: critical |
Oops, something went wrong.