Skip to content

Commit

Permalink
Merge pull request #162 from stephen-soltesz/pr-change-alert-threshold
Browse files Browse the repository at this point in the history
Update alert delay threshold for ScraperMostRecentArchivedFileTimeIsTooOld
  • Loading branch information
stephen-soltesz authored Jan 2, 2018
2 parents 270ae27 + d2c97ad commit f604143
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion config/federation/prometheus/alerts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,19 @@ ALERT SidestreamIsNotRunning
# affected by this at once, or b) many machines are affected and the
# ParserDailyVolumeTooLow will trigger first.
#
# Note: the delay threshold is set to 2h to prevent false positives. For
# example, if a machine remains running while it is not network accessible,
# then the machine will need time for scraper to catch up once it is network
# accessible again.
#
# TODO(soltesz): remove the != 0 check when legacy records are removed.
ALERT ScraperMostRecentArchivedFileTimeIsTooOld
IF (time() - (scraper_maxrawfiletimearchived{container="scraper-sync"} != 0)) > (56 * 60 * 60)
AND ON(machine)
(time() - process_start_time_seconds{service="sidestream"}) > (30 * 60 * 60)
UNLESS ON(machine)
lame_duck_node == 1
FOR 10m
FOR 2h
LABELS {
severity = "page"
}
Expand Down

0 comments on commit f604143

Please sign in to comment.