This application facilitates discovery of ECS and/or CloudMap resources and compatible with Prometheus HTTP Service Discovery.
Application leverages AWS API to dynamically discover:
- ECS Clusters, related ECS Services, and ECS Tasks
- CloudMap Namespaces, registered Service Connect and Service Discovery services, and their instances
making it easier for Prometheus to monitor these services without hardcoding their addresses and ports, or using any kind of file-based discovery (thus can be run as a standalone application: as it's own AWS ECS Service, for example).
Below is a high level overview of what application is capable of. For the full list of supported configuration parameters and how to override these using Docker Environment Variables please refer to appsettings.json
-
Prometheus Compatibility: By default, exposes
9001:/prometheus-targets
endpoint which response is compatible with <http_sd_config> of Prometheus configuration file, as well as with OpenTelemetry Prometheus Receiver (see example below). -
ECS Discovery: Supports discovery of ECS clusters, services, running tasks
- Supply
EcsClusters
as semicolon separated string of ECS clusters to include:"cluster1;cluster2;"
- Supports filtering of ECS Services that should be included in the Prometheus response based on Resource Tags (see
EcsServiceSelectorTags
configuration property) - Supports filtering to control which tags should be included as labels in the response (see
EcsTaskTags
,EcsServiceTags
configuration properties)
- Supply
-
AWS CloudMap Integration: Supports discovery of CloudMap namespaces, services and instances
- Supply
CloudMapNamespaces
as semicolon separated string of CloudMap namespaces to include:"namespace1;"
- Supports filtering of CloudMap Services that should be included in the Prometheus response based on Resource Tags (see
CloudMapServiceSelectorTags
configuration property) - Supports filtering to control which tags should be included as labels in the response (see
CloudMapServiceTags
,CloudMapNamespaceTags
configuration properties)
- Supply
-
Scalable and Flexible: Built to be scalable and flexible, adapting to changes in the service infrastructure
- Allows having any number of port-metrics pairs per each running ECS Task (useful when your ECS Task Definition defines multiple containers, which also expose ports and metrics endpoint). See
MetricsPathPortTagPrefix
configuration property for more details. - Allows supplying static set of labels to be added to every Prometheus target (see
ExtraPrometheusLabels
configuration property) - Supports rich mechanism for re-labling, so you don't have to fix your Grafana dashboards. See
RelabelConfigurations
configuration property for more details. - Exposes health check endpoint:
/health
- Allows having any number of port-metrics pairs per each running ECS Task (useful when your ECS Task Definition defines multiple containers, which also expose ports and metrics endpoint). See
At least one of EcsClusters
or CloudMapNamespaces
must be provided for application to work. If you would like to include both Service Connect and Service Discovery targets, CloudMapNamespaces
must be specified (with or without EcsClusters
). When both parameters are specified, the end result is only an intersection of targets that exist in ECS clusters and CloudMap namespaces provided.
Permissions: IAM permissions are required to discover ECS clusters (
ecs:Get*
,ecs:List*
,ecs:Describe*
) and CloudMap namespaces (servicediscovery:Get*
,servicediscovery:List*
,servicediscovery:Discover*
,route53:Get*
)
For the full example of running this in AWS, please navigate to /example folder. Integrates with OpenTelemetry receivers config.
Refer to appsettings.json for all supported configuration options.
Example run command:
docker run --rm \
# ** REQUIRED PARAMETERS **
# At least one of "EcsClusters" or "CloudMapNamespaces" must be provided.
-e DiscoveryOptions__EcsClusters="<cluster1>;<cluster2>" \
-e DiscoveryOptions__CloudMapNamespaces="<namespace1>;" \
# If running outside of AWS - pass credentials explicitly
# If running in AWS - use Task IAM Role.
# IMPORTANT: Credentials need to have permissions to describe ECS cluster(optionally CloudMap namespaces)
-e AWS_REGION="us-west-1" \
-e AWS_ACCESS_KEY_ID="<key>" \
-e AWS_SECRET_ACCESS_KEY="<secret>" \
-e AWS_SESSION_TOKEN="<session_token>" \
# ** OPTIONAL PARAMETERS **
# Will only include those services, which have 'prom_scrape_target' tag set to 'yes'.
# Leave blank to include all services in cluster(s)
-e DiscoveryOptions__EcsServiceSelectorTags="prom_scrape_target=yes;" \
# Same as above, but for filtering out CloudMap services
# Leave blank ot include all services in namespace(s)
-e DiscoveryOptions__CloudMapServiceSelectorTags="prom_scrape_target=yes;" \
# Semicolon separated string of tag keys to include in the service discovery response as metadata.
# Supports glob pattern matching using * and ?.
-e DiscoveryOptions__EcsTaskTags="*" \
-e DiscoveryOptions__EcsServiceTags="*" \
-e DiscoveryOptions__CloudMapServiceTags="AmazonECS*;" \
-e DiscoveryOptions__CloudMapNamespaceTags="*" \
# Semicolon separated string of labels to include in the service discovery response as metadata.
# Will be added to all discovered targets.
-e DiscoveryOptions__ExtraPrometheusLabels="custom_static_tag=my-static-tag;" \
# Tag prefix to identify metrics port, [path | "/metrics"], [name | ""] triplets.
# Please refer to the configuration options to learn more.
-e DiscoveryOptions__MetricsPathPortTagPrefix="METRICS_" \
# Add new or modify existing labels in the response using token replacements,
# to prevent the need of modifying your Grafana dashboards.
# Please refer to the configuration options to learn more.
-e DiscoveryOptions__RelabelConfigurations="cluster_and_service={{_sys_ecs_cluster}}-{{_sys_ecs_service}}" \
# Instructs .NET application to listen on 9001 inside the container
-e ASPNETCORE_URLS="http://*:9O01" \
# ** DOCKER **
-p 9001:9001 \
apptality/aws-ecs-cloudmap-prometheus-discovery:latest
Example output:
[
{
"targets": [
"10.200.10.200:8080"
],
"labels": {
"__metrics_path__": "/metrics",
"instance": "10.200.10.200",
"scrape_target_name": "app",
"__meta_cloudmap_service_instance_id": "c88nc14799fa46d794c1899612061h3s",
"__meta_cloudmap_service_name": "service-app",
"__meta_cloudmap_service_type": "ServiceConnect",
"__meta_ecs_cluster": "my-ecs-cluster",
"__meta_ecs_service": "my-fargate-application",
"__meta_ecs_task": "arn:aws:ecs:us-west-2:123456789012:task/my-ecs-cluster/c88nc14799fa46d794c1899612061h3s",
"__meta_ecs_task_definition": "arn:aws:ecs:us-west-2:123456789012:task-definition/my-fargate-application:2",
"_sys_cloudmap_service_instance_id": "c88nc14799fa46d794c1899612061h3s",
"_sys_cloudmap_service_name": "my-fargate-application",
"_sys_cloudmap_service_type": "ServiceConnect",
"_sys_ecs_cluster": "my-ecs-cluster",
"_sys_ecs_service": "my-fargate-application",
"_sys_ecs_task": "arn:aws:ecs:us-west-2:123456789012:task/my-ecs-cluster/c88nc14799fa46d794c1899612061h3s",
"_sys_ecs_task_definition": "arn:aws:ecs:us-west-2:123456789012:task-definition/my-fargate-application:2",
"prom_scrape_target": "yes",
"AmazonECSManaged": "true",
"custom_static_tag": "my-static-tag",
"cluster_and_service": "my-ecs-cluster-my-fargate-application"
}
},
...
]
Response is returned in HTTP_SD format compatible format:
[
{
"targets": [ "<host>", ... ],
"labels": {
"<labelname>": "<labelvalue>", ...
}
},
...
]
Success response is returned with application/json
HTTP Content Type, and 200
HTTP status code.
Some clarification on labels returned:
- scrape_target_name - when any of resource contains
METRICS_NAME_
AWS Resource Tag - this label will have tag value. ECS Tasks can contain multiple containers you would like to scrape from, this label being present helps to denote between such containers when running PromQL queries. - __meta - labels starting with this prefix are meta labels, and are not included into the resulting set stored Prometheus, but can be used for re-labeling.
- _sys - labels starting with this prefix are generated out of AWS Resources properties (ECS, CloudMap). They are prefixed as such in order to prevent conflict with your existing infrastructure configuration, and combined with
RelabelConfigurations
enable powerful manipulations of the resulting labels set.
Anything else returned is either inferred from AWS Resource Tags specified via selectors (EcsTaskTags
, EcsServiceTags
, CloudMapServiceTags
, CloudMapNamespaceTags
), supplied via ExtraPrometheusLabels
, or product of RelabelConfigurations
configurations.
Please note, that tags are resolved from AWS resource in the following order:
ECS Task
> ECS Service
> CloudMap Service
> CloudMap Namespace
meaning that if ECS Service has tag "MyCustomTag=EcsService"
and CloudMap Service has the same tag, but with different value: "MyCustomTag=CloudMap"
- the resulting scrape target label will have value of the former:
[
{
"targets": [ "<host>", ... ],
"labels": {
"MyCustomTag": "EcsService", ...
}
},
...
]
When application runs into an error, response is returned with 500
HTTP status code:
{
"type": "https://tools.ietf.org/html/rfc9110#section-15.6.1",
"title": "An error occurred while processing your request.",
"status": 500
}
You'll need to investigate server logs for exception details:
2024-08-20 21:22:23.242 -03:00 [ERR] HTTP GET /prometheus-targets responded 500 in 99.9711 ms
2024-08-20 21:22:23.243 -03:00 [ERR] An unhandled exception has occurred while executing the request.
Microsoft.Extensions.Options.OptionsValidationException: At least one of 'EcsClusters' or 'CloudMapNamespaces' name must be specified.
at Microsoft.Extensions.Options.OptionsFactory`1.Create(String name)
...
~/.scripts/dockerhub-to-ecr.sh
script is intended for users who need control over their Docker images in AWS ECR, allowing to pull an image from DockerHub, re-tag it, and push it to an ECR repository with customizable options for platform and tagging.
Run the script specifying your ECR URL as the first positional parameter:
# <paste in your AWS credential>
cd ./scripts
chmod +x dockerhub-to-ecr.sh
./dockerhub-to-ecr.sh <target_ecr_repository_url> \
# Defaults to 'apptality/aws-ecs-cloudmap-prometheus-discovery:latest'
[source_dockerhub_image_url] \
# Specify explicit value of image tag you want to label your ECR image with.
# By default, same tag as on source image will be applied.
[target_ecr_repository_tag (Default: same as source)] \
# Specify docker platform to re-push. Amd64/Arm64 available.
[docker_image_platform: 'linux/amd64' (Default) | 'linux/arm64']
Example:
./dockerhub-to-ecr.sh 123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd
The above script will:
- Authenticate with AWS ECR region
us-west-2
- Pull
linux/amd64
platform ofapptality/aws-ecs-cloudmap-prometheus-discovery:latest
locally - Re-tag it to
123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd:latest
- Push
123456789012.dkr.ecr.us-west-2.amazonaws.com/tools/ecs-sd:latest
to ECR
target_ecr_repository_url
: The full URL of your target AWS ECR repository without tag.source_dockerhub_image_url
(optional): DockerHub image to pull (default:apptality/aws-ecs-cloudmap-prometheus-discovery:latest
).target_ecr_repository_tag
(optional): ECR image tag (defaults to the source image tag).docker_image_platform
(optional): Image platform (linux/amd64
by default).
Note: you need to create destination ECR repository first.
This application is heavily influenced by the following articles:
- Metrics collection from Amazon ECS using Amazon Managed Service for Prometheus
- Metrics and traces collection from Amazon ECS using AWS Distro for OpenTelemetry with dynamic service discovery
Definitely suggest reading up on what is AWS Distro for OpenTelemetry (ADOT), as it provides open source APIs, libraries, and agents to collect logs, metrics, and traces from your applications.
For how to integrate AWS AMP (Prometheus) with your Grafana please refer to Set up Grafana open source or Grafana Enterprise for use with Amazon Managed Service for Prometheus article by AWS.
There are definitely some discovery tools that are alternative to current application:
- Container Insights Prometheus metrics monitoring discovers your infrastructure using CloudWatch
- IMHO - How CloudWatch agent configuration is structured around discovery of resources is NOT idiomatic to AWS. AWS has a powerful, rich support of tags on pretty much any resource. It would be very natural & AWS'ish to use tagging filters for resource selection instead of complex Regex filters.
- aws-cloudmap-prometheus-sd plugin by AWS
- prometheus-for-ecs plugin by AWS. Definitely dig into deploy-prometheus and deploy-adot folders of the repository to get a good understanding of how integrations are being set up.
Contributions are welcome! Please feel free to submit pull requests or open issues to discuss potential improvements or features.
This project is licensed under the MIT License - see the LICENSE file for details.