Skip to content

A script to monitor your workload profiles and asses if your apps will run low on resources at max scale out.

Notifications You must be signed in to change notification settings

simonjj/aca-wp-monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Workload Profile Monitoring for Azure Container Apps

This is a script designed to monitor and assess the provisioning of Azure Container Apps (ACA) environments and their associated workload profiles. It gathers resource usage data, evaluates if the resources are underprovisioned, and sends metrics to Azure Monitor. From there the metrics can be used to build dashboards or alert upon. The script also produces messages to stdout in case of a underprovisioning scenario.

screenshot showing the metrics list

Quickstart

This quickstart assumes you have an existing Azure Container App environment already. We deploy from a ready made container image (simon.azurecr.io/aca-wp-monitor:latest) hosted publicly on Azure Container Registry. Image sha256 is 0c7024c471593f6b2f41c21f13e856e3dfc577ba8047314f980df37cbfe5d35c. This job will run every hour at the top of the hour.

export SUBSCRIPTION_ID=<ACA_ENVIRONMENT_SUBSCRIPTION>
export RESOURCE_GROUP=<ACA_ENVIRONMENT_RESOURCE_GROUP>
export ENVIRONMENT_NAME=<ACA_ENVIRONMENT_NAME>
export JOB_NAME=workload-profile-monitor

# we create the job first and enable managed identity
az containerapp job create \
  --name $JOB_NAME \
  --resource-group $RESOURCE_GROUP \
  --environment $ENVIRONMENT_NAME \
  --cpu 0.5 \
  --memory 1.0Gi \
  --image simon.azurecr.io/aca-wp-monitor:latest \
  --mi-system-assigned \
  --env-vars SUBSCRIPTION_ID=$SUBSCRIPTION_ID RESOURCE_GROUP=$RESOURCE_GROUP ENVIRONMENT_NAME=$ENVIRONMENT_NAME \
  --replica-timeout 3600 \
  --replica-retry-limit 3 \
  --replica-completion-count 1 \
  --parallelism 1 \
  --trigger-type Schedule \
  --cron-expression "0 */1 * * *"

# we retrieve the managed identity to use for role assignments
PRINCIPAL_ID=$(az containerapp job show --name $JOB_NAME --resource-group $RESOURCE_GROUP --query "identity.principalId" --output tsv)

# we assign the Reader role to be able to get the app and workload profile details
az role assignment create --assignee $PRINCIPAL_ID --role Reader --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP

# we assign the Monitoring Publisher Role to be able to send the metrics to the job's metrics panel
az role assignment create --assignee $PRINCIPAL_ID --role "Monitoring Metrics Publisher" --scope /subscriptions/$SUBSCRIPTION_ID

Once complete the roles screen for the job's managed identity should look similar to this screenshot:

screenshot showing the job's roles

Details and Notes

  • This script is intended to be run as a job inside the ACA environment you intend to monitor.
  • It uses system assigned managed identity to authenticate itself as both a Monitoring Metrics Publisher and Reader. These roles will have to be assigned to the managed identity for the job.
  • Metrics are published and available on the job's Metric tab.
  • The script requires the following environment variables to be provided to it SUBSCRIPTION_ID, RESOURCE_GROUP, ENVIRONMENT_NAME. A forth one, CONTAINER_APP_JOB_NAME for jobs, or CURRENT_APP_NAME for apps, will be provided automatically as part of the ACA runtime environment.
  • The script uses Azure's Python SDK.

Available Metrics & Output

Metric or Output Description
Workload Profile The name the workload profile. This is represented as a dimension and can be access via filters.
Currently Used CPU The amount of CPU currently being used across all the apps and their currently running replicas on the workload profile.
Currently Used Memory (GB) The amount of memory currently being used across all the apps and their replicas on a workload profile.
Currently Available CPU The amount of CPU currently available on the workload profile.
Currently Available Memory (GB) The amount of memory currently available on the workload profile.
Max Scale Needed CPU The maximum amount of CPU needed if all the apps on a workload profile would run at maximum scale.
Max Scale Needed Memory (GB) The maximum amount of memory needed if all the apps on a workload profile would run at maximum scale.
Max Available CPU The maximum amount of CPU available if the workload profile would be scaled to the maximum.
Max Available Memory (GB) The maximum amount of memory available if the workload profile would be scaled to the maximum.
Provisioning Assessment An assessment of whether Max Scale (CPU or Memory) exceeds Max Available (CPU or Memory). This output gets produced to stdout.

Sample Provisioning Alert Message

The most high-level output gets provided via the console execution log for the job or app running the script. In case of a underprovisioned scenario a message is printed there and should be altered on.

WARNING: Workload Profile mem-pool is underprovisioned: CPU underprovisioned by 4.0 vCPUs. Memory underprovisioned by 38.0 GB.

Usage

Once gathered metrics become available under the Metrics panel for the job. The metric namespace is workload profile. All aggregations show the same number for any collection cycle. Individual workload profile are available via dimensional values and can be accessed via filters.

screenshot showing the monitoring panel filter

Limitations and Todo

  • Jobs or Components are currently not part of the assessment.
  • GPU-based workload profiles are not included in the assessment.

About

A script to monitor your workload profiles and asses if your apps will run low on resources at max scale out.

Topics

Resources

Stars

Watchers

Forks