Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PrometheusMissingRuleEvaluations runbook #45

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions content/runbooks/prometheus/PrometheusMissingRuleEvaluations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# PrometheusMissingRuleEvaluations

## Meaning

Alert fires when prometheus rule_group evaluation takes consistently longer than rule_group interval.

## Impact

Rule groups have either alerts or recording rules. If prometheus can not evaluate rules in time - it might fail to trigger alert.

## Diagnosis

Quick checks:
- Check if enough resources allocated to promeheus.
- Check if there are no bad neighbors that consume too much CPU.

Deep dive:
- Use `prometheus_rule_group_iterations_missed_total` metric to identify strugling rule_group.

## Mitigation

Quick fixes:
- Increase CPU resources allocation to prometheus.
- Movebad neighbor to different host.

Deep dive:
- Increase rule evaluate interval.
- Splitup up rule_group into smaller groups if rules do not depend on each other. It should help because rules inside a group are evaluated in sequence.