Docker Flow Monitor can be controlled by sending HTTP requests or through Docker Service labels when combined with Docker Flow Swarm Listener.
Reconfigure endpoint can be used to send requests to Docker Flow Monitor with the goal of adding or modifying existing scrape targets and alerts. Parameters are divided into scrape and alert groups.
Query parameters that follow should be added to the base address [MONITOR_IP]:[MONITOR_PORT]/v1/docker-flow-monitor/reconfigure.
!!! tip Defines Prometheus scrape targets
Query | Description | Required |
---|---|---|
metricsPath | The path of the metrics endpoint. Defaults to /metrics . |
No |
scrapeInterval | How frequently to scrape targets from this job. | No |
scrapeTimeout | Per-scrape timeout when scraping this job. | No |
scrapePort | The port through which metrics are exposed. | Yes |
serviceName | The name of the service that exports metrics. | Yes |
scrapeType | A set of targets and parameters describing how to scrape metrics. | No |
You can find more about scrapeType's on Scrape Config.
!!! tip Defines Prometheus alerts
Query | Description | Required |
---|---|---|
alertAnnotations | This parameter is translated to Prometheus alert ANNOTATIONS statement. Annotations are used to store longer additional information.Example: summary=Service memory is high,description=Do something or start panicking |
No |
alertFor | This parameter is translated to Prometheus alert FOR statement. It causes Prometheus to wait for a certain duration between first encountering a new expression output vector element (like an instance with a high HTTP error rate) and counting an alert as firing for this element. Elements that are active, but not firing yet, are in pending state. This parameter expects a number with time suffix (e.g. s for seconds, m for minutes).Example: 30s |
No |
alertIf | This parameter is translated to Prometheus alert IF statement. It is an expression that will be evaluated and, if it returns true, an alert will be fired.Example: container_memory_usage_bytes{container_label_com_docker_swarm_service_name="go-demo"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="go-demo"} > 0.8 |
Yes |
alertLabels | This parameter is translated to Prometheus alert LABELS statement. It allows specifying a set of additional labels to be attached to the alert. Multiple labels can be separated with comma (, ).Example: severity=high,receiver=system |
No |
alertName | The name of the alert. It is combined with the serviceName thus producing an unique identifier.Example: memoryAlert |
Yes |
serviceName | The name of the service. It is combined with the alertName thus producing an unique identifier.Example: go-demo |
Yes |
alertPersistent | When set to true, the alert will persist when the service is scaled to zero replicas. Example: true |
No |
Those parameters can be indexed so that multiple alerts can be defined for a service. Indexing is sequential and starts from 1. An example of indexed alertName
could be alertName.1=memload
and alertName.2=diskload
.
Please visit Alerting Overview for more information about the rules for defining Prometheus alerts.
!!! tip
Allows short specification of commonly used alertIf
parameters
Shortcut | Description |
---|---|
@node_fs_limit:[PERCENTAGE] | Whether node file system usage is over specified percentage of the total available file system size. Requirements: node-exporter metrics[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: @node_fs_limit:0.8 would be expanded to (node_filesystem_size{fstype="aufs", job="my-service"} - node_filesystem_free{fstype="aufs", job="my-service"}) / node_filesystem_size{fstype="aufs", job="my-service"} > 0.8 . |
@node_mem_limit:[PERCENTAGE] | Whether node memory usage is over specified percentage of the total node memory. Requirements: node-exporter metrics[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: @node_mem_limit:0.8 would be expanded to (sum by (instance) (node_memory_MemTotal{job="my-service"}) - sum by (instance) (node_memory_MemFree{job="my-service"} + node_memory_Buffers{job="my-service"} + node_memory_Cached{job="my-service"})) / sum by (instance) (node_memory_MemTotal{job="my-service"}) > 0.8 . |
@node_mem_limit_total_above:[PERCENTAGE] | Whether memory usage of all the nodes is over the specified percentage of the total memory. Requirements: node-exporter metrics[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: @node_mem_limit_total_above:0.8 would be expanded to (sum(node_memory_MemTotal{job="my-service"}) - sum(node_memory_MemFree{job="my-service"} + node_memory_Buffers{job="my-service"} + node_memory_Cached{job="my-service"})) / sum(node_memory_MemTotal{job="my-service"}) > 0.8 . |
@node_mem_limit_total_below:[PERCENTAGE] | Whether memory usage of all the nodes is below the specified percentage of the total memory. Requirements: node-exporter metrics[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: @node_mem_limit_total_below:0.4 would be expanded to (sum(node_memory_MemTotal{job="my-service"}) - sum(node_memory_MemFree{job="my-service"} + node_memory_Buffers{job="my-service"} + node_memory_Cached{job="my-service"})) / sum(node_memory_MemTotal{job="my-service"}) < 0.4 . |
@replicas_running | Whether the number of running replicas is as desired. Requirements: cAdvisor metrics and a service running in the replicated mode. The alert uses container_memory_usage_bytes metric only as a way to count the number of running containers.Example: @replicas_running for a service with the number of desired replicas set to 3 would be expanded to count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}) != 3 . |
@replicas_more_than | Whether the number of running replicas is more than desired. Requirements: cAdvisor metrics and a service running in the replicated mode. The alert uses container_memory_usage_bytes metric only as a way to count the number of running containers.Example: @replicas_running for a service with the number of desired replicas set to 3 would be expanded to count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}) > 3 . |
@replicas_less_than | Whether the number of running replicas is less than desired. Requirements: cAdvisor metrics and a service running in the replicated mode. The alert uses container_memory_usage_bytes metric only as a way to count the number of running containers.Example: @replicas_running for a service with the number of desired replicas set to 3 would be expanded to count(container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}) < 3 . |
@resp_time_above:[QUANTILE],[RATE_DURATION],[PERCENTAGE] | Whether response time of a given quantile over the specified rate duration is above the set percentage. Requirements: histogram with the name http_server_resp_time and with response times expessed in seconds.[QUANTILE] must be one of the quantiles defined in the metric. [RATE_DURATION] can be in any format supported by Prometheus (e.g. 5m ).[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: @resp_time_above:0.1,5m,0.9999 would be expanded to sum(rate(http_server_resp_time_bucket{job="my-service", le="0.1"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) < 0.9999 . |
@resp_time_below:[QUANTILE],[RATE_DURATION],[PERCENTAGE] | Whether response time of a given quantile over the specified rate duration is below the set percentage. Requirements: histogram with the name http_server_resp_time and with response times expessed in seconds.[QUANTILE] must be one of the quantiles defined in the metric. [RATE_DURATION] can be in any format supported by Prometheus (e.g. 5m ).[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: @resp_time_below:0.025,5m,0.75 would be expanded to sum(rate(http_server_resp_time_bucket{job="my-service", le="0.025"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.75 . |
@resp_time_server_error:[RATE_DURATION],[PERCENTAGE] | Whether error rate over the specified rate duration is below the set percentage. Requirements: histogram with the name http_server_resp_time and with label code set to value of the HTTP response code.[RATE_DURATION] can be in any format supported by Prometheus (e.g. 5m ).[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: @resp_time_server_error:5m,0.001 would be expanded to sum(rate(http_server_resp_time_count{job="my-service", code=~"^5..$$"}[5m])) / sum(rate(http_server_resp_time_count{job="my-service"}[5m])) > 0.001 . |
@service_mem_limit:[PERCENTAGE] | Whether service memory usage is over specified percentage of the service memory limit. Requirements: cAdvisor metrics and service memory limit specified as service resource.[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: If serviceName is set to my-service , @service_mem_limit:0.8 would be expanded to container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="my-service"} > 0.8 . |
@service_mem_limit_nobuff:[PERCENTAGE] | Whether service memory usage without linux buffer is over specified percentage of the service memory limit. Requirements: cAdvisor metrics and service memory limit specified as service resource.[PERCENTAGE] must be specified as a decimal value (e.g. 0.8 equals 80% ).Example: If serviceName is set to my-service , @service_mem_limit_nobuff:0.8 would be expanded to (container_memory_usage_bytes{container_label_com_docker_swarm_service_name="my-service"}-container_memory_cache{container_label_com_docker_swarm_service_name="my-service"})/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="my-service"} > 0.8 . |
!!! note
I hope that the number of shortcuts will grow with time thanks to community contributions. Please create an issue with the alertIf
statement and the suggested shortcut and I'll add it to the code as soon as possible.
Docker Flow Monitor supports Docker Secrets for adding custom alertIf shortcuts. Only secrets with names that start with alertif-
or alertif_
will be considered. alertIf
shortcuts are configured as a yaml file with a series of dictionaries. The key of each dictionary is your custom alertIf
shortcut which must begin with the @
character. The value of each dictionary consist of three keys: expanded
, annotations
and labels
. expanded
contains the expanded alert using go templates. annotations
and labels
contains a dictionary with the alert's annotations and labels. For example @service_mem_limit
is defined by the following yaml:
"@service_mem_limit":
expanded: container_memory_usage_bytes{container_label_com_docker_swarm_service_name="{{ .Alert.ServiceName }}"}/container_spec_memory_limit_bytes{container_label_com_docker_swarm_service_name="{{ .Alert.ServiceName }}"} > {{ index .Values 0 }}
annotations:
summary: Memory of the service {{ .Alert.ServiceName }} is over {{ index .Values 0 }}
labels:
receiver: system
service: "{{ .Alert.ServiceName }}"
!!! tip AlertIf shortcuts defined in secrets will take priority over default shortcuts.
The logical operators and
, unless
, and or
can be used in combinations with AlertIf Parameter Shortcuts. For example, to create an alert that triggers when response time is low unless response time is high, set alertIf=@resp_time_below:0.025,5m,0.75_unless_@resp_time_above:0.1,5m,0.99
. This alert prevents @resp_time_below
from triggering while @resp_time_above
is triggering. The summary
annotation for this alert will be merged with the and
operator: "Response time of the service my-service is below 0.025 unless Response time of the service my-service is above 0.1". When using logical operators, there are no default alert labels. The alert labels will have to be manually set by using the alertLabels
query parameter.
More information on the logical operators can be found on Prometheus's querying documentation.
!!! tip Removes Prometheus scrapes and alerts
Remove endpoint can be used to send request to Docker Flow Monitor with the goal of removing scrapes and alerts related to a service.
Query parameters that follow should be added to the base address [MONITOR_IP]:[MONITOR_PORT]/v1/docker-flow-monitor/remove.
Query | Description | Required |
---|---|---|
serviceName | The name of the service that should be removed. | Yes |