diff --git a/README.md b/README.md index 52db1ce..6d7a8a4 100644 --- a/README.md +++ b/README.md @@ -90,8 +90,13 @@ Not every single piece of infrastructure needs every single item on the list but | ☐ |
Build VM images

If you want to run your apps directly on Virtual Machines, you should package them as a [managed image](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/capture-image-resource) using [PowerShell](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/capture-image-resource#create-an-image-of-a-vm-using-PowerShell) or a tool such as [Packer](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/build-image-with-packer). Although I recommend Docker for all stateless apps (see below), I recommend directly using VM images and VM Instances for all stateful apps, such as any app that writes to its local disk (e.g., WordPress, Jenkins).

| | ☐ |
Deploy VM images using scale sets

The best way to deploy a VM image is typically to run it as a [scale set](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview) . This will allow you to spin up multiple VM Instances that run your VM image, scale the number of instances up and down in response to load, and automatically replace failed Instances.

| | ☐ |
Build Docker images

If want to run your apps as containers, you should package your apps as [Docker](https://www.docker.com/) images and push those images to the [Azure Container Registry (ACR)](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro). I recommend Docker for all stateless apps and for local development (along with [Docker Compose](https://docs.docker.com/compose/)).

| -| ☐ |
Deploy Docker images using AKS

For running Docker containers in Azure I recommend using [Azure Kubernetes Service (AKS)](https://docs.microsoft.com/en-us/azure/aks/), which is a Azure's managed Kubernetes.
Another option is [Azure Container Instances (ACI)](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview), a service where Azure manages and scales the underlying VM Instances for you and you just hand it Docker containers to run. However, this is not recommended for scenarios where you need full container orchestration, including service discovery across multiple containers, automatic scaling, and coordinated application upgrades.

| +| ☐ |
Deploy Docker images using ACR and AKS

For running Docker containers in Azure I recommend using [Azure Container Registry](https://azure.microsoft.com/en-ca/services/container-registry/) and [Azure Kubernetes Service (AKS)](https://docs.microsoft.com/en-us/azure/aks/), which is a Azure's managed Kubernetes.
Another option is [Azure Container Instances (ACI)](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview), a service where Azure manages and scales the underlying VM Instances for you and you just hand it Docker containers to run. However, this is not recommended for scenarios where you need full container orchestration, including service discovery across multiple containers, automatic scaling, and coordinated application upgrades.

| +| ☐ |
Deploy AKS apps using Helm

Package and deploy apps with Helm charts and push to [Azure Container Registry Helm Repos](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-helm-repos)
Helm is an open-source package manager which provides a large chart repository of applications. Charts can be customized and deployed to Kubernetes as part of an application stack.

| +| ☐ |
Follow ACR best practices

Maximize the cost and performance of ACR with AKS by following [Best practices for Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-best-practices).
As images are deployed to an ACR and consumed from AKS, governance becomes important. Follow these guidelines to ensure performance, storage, registry organization and cost.

| +| ☐ |
Follow AKS best practices

Build and run applications successfully in AKS by following [Best practices for Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/best-practices).
As a Cluster Operator, follow best practices of Multi-Tenancy, Security, Network and storage, and Disaster Recovery. As a Developer, ensure resource limits are implemented, logging is in place, and pods and containers are secure.

| | ☐ |
Deploy serverless apps using Azure Functions and API Management

If you want to build serverless apps, I recommend you use [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). You can expose your Azure Functions as HTTP endpoints using [API Management](https://docs.microsoft.com/en-us/azure/api-management/import-function-app-as-api).

| +| ☐ |
Follow Azure Functions best practices

Serverless solutions have some common [Functions Best Practices](https://docs.microsoft.com/en-us/azure/azure-functions/functions-best-practices) to follow.
In addition to the practices for each function app's source language, follow the guidelines for performance, scalability, error handling and logging.

| +| ☐ |
Follow Azure App Service best practices

If you are running an App Service, or scaled Azure Functions, follow [Azure App Service Best Practices](https://docs.microsoft.com/en-us/azure/app-service/app-service-best-practices)
Manage resource limits, auto-healing, backup, and recovery.

| | ☐ |
Configure CPU, memory, and GC settings

Configure CPU settings, memory settings (e.g., -Xmx, -Xms settings for a JVM), and GC settings (if applicable) for your app. If you're deploying directly on VM Instances, these should be configured based on the available CPU and memory on your VM Instance (see [Instance Types](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes)). If you are deploying Docker containers, then tell the scheduler the [resources](https://docs.microsoft.com/en-us/azure/aks/developer-best-practices-resource-management) your app needs , and it will automatically try to find a VM Instance that has those resources.

| | ☐ |
Configure hard drives

Configure the [OS disk](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/managed-disks-overview#os-disk) on each VM Instance with enough space for your app and log files. For further data storage, attach one or more [Data disks](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/managed-disks-overview#data-disk).

| @@ -186,7 +191,7 @@ Not every single piece of infrastructure needs every single item on the list but | ☐ |
Configure encryption at rest

Enable [encryption](https://docs.microsoft.com/en-us/azure/security/fundamentals/azure-disk-encryption-vms-vmss) on the OS and Data disks of each VM instance. Many Azure services optionally support encryption: e.g., see [Always Encrypted Azure SQL](https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted-azure-key-vault)

| | ☐ |
Deploy a Bastion Host

All VM instances should be in a private subnet and NOT accessible directly from the public Internet. Only a single, locked-down VM instance, known as the Bastion Host, should run in the public subnets. You must first connect to the Bastion Host, which gets you "in" to the network, and then you can use it as a "jump host" to connect to the other VM instances. I recommend using [Azure Bastion](https://docs.microsoft.com/en-us/azure/bastion/bastion-overview) which is a fully-managed PaaS service.

| | ☐ |
Deploy a VPN Server

I typically recommend running a VPN Server as the entrypoint to your network. [OpenVPN](https://openvpn.net/) is the most popular option for running a VPN server. However, I would recommend using a PaaS option such as [VPN Gateway](https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-about-vpngateways). Alternatively, to extend your on-premises networks I would recommend using [ExpressRoute](https://docs.microsoft.com/en-us/azure/expressroute/expressroute-introduction).

| -| ☐ |
Set up a secrets management solution

**NEVER** store secrets in plaintext. Developers should store their secrets in a secure secrets manager, such as [pass](https://www.passwordstore.org/), [1Password](https://1password.com/), or [LastPass](https://www.lastpass.com/). Applications should store all their secrets (such as DB passwords and API keys) either in [secret variables](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=classic%2Cbatch#secret-variables) within a Azure DevOps variable group or in a secret store such as [Azure Vault](https://docs.microsoft.com/en-us/azure/key-vault/key-vault-overview) or [Hashicorp Vault](https://www.vaultproject.io/).

| +| ☐ |
Set up a secrets management solution

**NEVER** store secrets in plaintext. Developers should store their secrets in a secure secrets manager, such as [pass](https://www.passwordstore.org/), [1Password](https://1password.com/), or [LastPass](https://www.lastpass.com/). Applications should store all their secrets (such as DB passwords and API keys) either in [secret variables](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=classic%2Cbatch#secret-variables) within a Azure DevOps variable group or in a secret store such as [Azure Key Vault](https://docs.microsoft.com/en-us/azure/key-vault/key-vault-overview), .NET Core [Secret Manager](https://docs.microsoft.com/en-us/aspnet/core/security/app-secrets) or [Hashicorp Vault](https://www.vaultproject.io/).

| | ☐ |
Use server hardening practices

Every server should be hardened to protect against attackers. This may include: running [CIS Hardened Images](https://www.cisecurity.org/cis-hardened-images/microsoft/), [unattended upgrades](https://docs.microsoft.com/en-us/azure/automation/automation-tutorial-update-management) to automatically install critical security patches, [firewall software](https://en.wikipedia.org/wiki/Firewall_(computing)), [anti-virus software](https://en.wikipedia.org/wiki/Antivirus_software), and [file integrity monitoring software](https://en.wikipedia.org/wiki/File_integrity_monitoring).

| | ☐ |
Go through the OWASP Top 10

Browse through the [Top 10 Application Security Risks](https://www.owasp.org/index.php/Top_10-2017_Top_10) list from the [Open Web Application Security Project (OWASP)](https://www.owasp.org/index.php/Main_Page) and check your app for vulnerabilities such as injection attacks, CSRF, and XSS.

| | ☐ |
Review against the latest CIS Azure Benchmark

Review against the latest [CIS Microsoft Azure Foundations Benchmark](https://www.cisecurity.org/benchmark/azure/) document to check that any [Centre for Internet Security](https://www.cisecurity.org/) recommended security considerations have been made, to harden the environment against potential exploits.

| @@ -197,7 +202,7 @@ Not every single piece of infrastructure needs every single item on the list but | ☐ |
Create Active Directory Groups

Manage permissions for Active Directory users using [Active Directory Groups](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-groups-create-azure-portal). Follow the [Principle of Least Privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege), assigning the minimum permissions possible to each Active Directory Group and User.

| | ☐ |
Create Active Directory Roles

Give your Active Directory Groups access to Azure resources by assigning [Roles (RBAC)](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview).

| | ☐ |
Create a password policy and enforce MFA

Set a [password policy](https://docs.microsoft.com/en-us/azure/active-directory-domain-services/password-policy) that requires a long password for all users and require every user to enable [Multi-Factor Authentication (MFA)](https://docs.microsoft.com/en-us/azure/active-directory/authentication/concept-mfa-howitworks).

| -| ☐ |
Record audit Logs

Configure [audit logs](https://docs.microsoft.com/en-us/azure/security/fundamentals/log-audit) of all changes happening in your Azure subscription. I recommend [Azure Security Centre](https://docs.microsoft.com/en-us/azure/security-center/security-center-intro) to help manage this.

| +| ☐ |
Record audit Logs

Configure [audit logs](https://docs.microsoft.com/en-us/azure/security/fundamentals/log-audit) of all changes happening in your Azure subscription. I recommend [Azure Security Centre](https://docs.microsoft.com/en-us/azure/security-center/security-center-intro) to help manage this. Azure Policy will recommend 365 days of retention by default. Also refer to OWASP [Logging Cheatsheet](https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html).

| | ☐ |
Configure Azure Policy

Configure [Azure Policy](https://docs.microsoft.com/en-us/azure/governance/policy/overview) to monitor for compliance across your Azure resources and enforce different rules and effects to meet your company requirements. Azure provide many example policy definitions you can use to get you started, in addition to these you can use the [Azure policy initiative definitions](https://docs.microsoft.com/en-us/azure/governance/policy/overview#initiative-definition). These initiatives group policies together to monitor and enforce policies for a common goal.
An example Initiative definition is **Audit VMs with insecure password security settings**. This initiative includes policies for password complexity, password re-use policy, password age policy, amongst others, all to achieve the goal of ensuring password security settings are correct.
Policies can also be deployed using [Azure Blueprints](https://docs.microsoft.com/en-us/azure/governance/blueprints/overview). This includes blueprint samples for [ISO 27001](https://docs.microsoft.com/en-us/azure/governance/blueprints/samples/iso27001/) and [CIS Microsoft Azure Foundations Benchmark](https://docs.microsoft.com/en-us/azure/governance/blueprints/samples/cis-azure-1.1.0/). Deploying your infrastructure and policies together using blueprints means you can deploy your solution in a way that meets your compliance needs using a trusted, repeatable process.

| ### **Monitoring** @@ -209,7 +214,7 @@ Not every single piece of infrastructure needs every single item on the list but | ☐ |
Track application metrics

Metrics around what your application is doing, such as QPS, latency, and throughput. Useful tools: [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) which is part of [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview).

| | ☐ |
Track server metrics

Metrics around what your hardware is doing, such as CPU, memory, and disk usage. Useful tools: [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) which is part of [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview).

| | ☐ |
Configure services for observability

Record events and stream data from all services. Slice and dice it using tools such as [Kafka](https://docs.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-introduction), [Honeycomb](https://www.honeycomb.io/), and of course [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) which is part of [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview).

| -| ☐ |
Store logs

To prevent log files from taking up too much disk space, configure log rotation on every server. To be able to view and search all log data from a central location (i.e., a web UI), set up log aggregation using tools such as [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-sources-custom-logs), [Filebeat](https://www.elastic.co/products/beats/filebeat), [Logstash](https://www.elastic.co/products/logstash) etc.

| +| ☐ |
Store and rotate logs

To prevent log files from taking up too much disk space, configure log rotation on every server. To be able to view and search all log data from a central location (i.e., a web UI), set up log aggregation using tools such as [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-sources-custom-logs), [Filebeat](https://www.elastic.co/products/beats/filebeat), [Logstash](https://www.elastic.co/products/logstash) etc.

| | ☐ |
Set up alerts

Configure alerts when critical metrics cross pre-defined thresholds, such as CPU usage getting too high or available disk space getting too low. Most of the metrics and log tools listed earlier in this section support alerting. Set up an on-call rotation using tools such as [PagerDuty](https://www.pagerduty.com/), [Opsgenie](https://www.opsgenie.com/) and [VictorOps](https://victorops.com/).

| ### **Cost optimization** @@ -222,12 +227,13 @@ Not every single piece of infrastructure needs every single item on the list but | ☐ |
Shut down VM instances when not using them

You can shut down VM instances when you're not using them, such as in your pre-prod environments at night and on weekends. You could even create an [Azure Automation](https://docs.microsoft.com/en-us/azure/automation/automation-solution-vm-management) solution that does this on a regular schedule.

| | ☐ |
Use Scale Sets

Use [Scale Sets](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview) to increase the number of VM instances when load is high and then to decrease it again—and thereby save money—when load is low.

| | ☐ |
Use Docker when possible

If you deploy everything as an directly on your VM instances, then you will typically run exactly one type of app per VM instance. If you use a Docker orchestration tool (e.g., [AKS](https://docs.microsoft.com/en-us/azure/aks/intro-kubernetes)), you can give it a cluster of VM instances to manage, and it will deploy Docker containers across the cluster as efficiently as possible, potentially running multiple apps on the same instances when resources are available.

| -| ☐ |
Use Azure Functions when possible

For all short (5 min or less) background jobs, cron jobs, ETL jobs, event processing jobs, and other glue code, use [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). You not only have no servers to manage, but Azure Function pricing is incredibly cheap, with the first 1 million executions and 400,000 GB-seconds per month being completely free! After that, it's just £0.150 per million executions and £0.000012 for every GB-second.

| +| ☐ |
Use Azure Functions when possible

For all short (5 min or less) background jobs, cron jobs, ETL jobs, event processing jobs, and other glue code, use [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). You not only have no servers to manage, but Azure Functions [Consumption Plan pricing](https://azure.microsoft.com/en-ca/pricing/details/functions/) is incredibly cheap, with the first 1 million executions and 400,000 GB-seconds per month being completely free! After that, it's just £0.150 per million executions and £0.000012 for every GB-second.

| | ☐ |
Clean up old data with Azure Blob Lifecycle Management

If you have a lot of data in Azure Blob Storage, make sure to take advantage of [Azure Blob Lifecycle Management](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-portal) to save money. You can configure the Azure Blob to move files older than a certain age either to cheaper [storage tiers](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers) or to delete those files entirely.

| | ☐ |
Clean up unused resources

Use [Azure Advisor](https://docs.microsoft.com/en-us/azure/advisor/advisor-cost-recommendations) to identify unused or underutilised Azure resources, such as old VM instances that no one is using any more.

| | ☐ |
Learn to analyze your Azure bill

Learn to use tools such as [Azure Advisor](https://docs.microsoft.com/en-us/azure/advisor/advisor-cost-recommendations), and [Cloudyn](https://docs.microsoft.com/en-us/azure/cost-management/overview) to understand where you're spending money. If you find something you can't explain, reach out to Azure Support, and they will help you track it down.

| -| ☐ |
Create billing alerts

Create [alerts](https://docs.microsoft.com/en-us/azure/cost-management/cost-mgt-alerts-monitor-usage-spending) to notify you when your Azure bill crosses important thresholds. Make sure to have several levels of alerts: e.g., at the very least, one when the bill is a little high, one when it's really high, and one when it is approaching bankruptcy levels.

| - +| ☐ |
Configure Log Analytics Workspace Data Volume Management

Turn on [Daily Cap](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/manage-cost-storage) to limit the volume of data that the Log Analytics workspace will ingest per day.

| +| ☐ |
Create budgets and billing alerts

Create [alerts](https://docs.microsoft.com/en-us/azure/cost-management/cost-mgt-alerts-monitor-usage-spending) to notify you when your Azure bill crosses important thresholds. Make sure to have several levels of alerts: e.g., at the very least, one when the bill is a little high, one when it's really high, and one when it is approaching bankruptcy levels.

| +| ☐ |
Tag resource groups and resources

Tag [resources](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/tag-resources) to assist with inventory, billing and logical organization. Recommended tags include department, environment, costCenter, productName, solutionName. Use Azure Policy to enforce tag standards.

| --- Inspired by the Gruntwork [Production Readiness Checklist](https://gruntwork.io/devops-checklist/) which covers AWS