If you want to run your apps directly on Virtual Machines, you should package them as a [managed image](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/capture-image-resource) using [PowerShell](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/capture-image-resource#create-an-image-of-a-vm-using-PowerShell) or a tool such as [Packer](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/build-image-with-packer). Although I recommend Docker for all stateless apps (see below), I recommend directly using VM images and VM Instances for all stateful apps, such as any app that writes to its local disk (e.g., WordPress, Jenkins).
The best way to deploy a VM image is typically to run it as a [scale set](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview) . This will allow you to spin up multiple VM Instances that run your VM image, scale the number of instances up and down in response to load, and automatically replace failed Instances.
If want to run your apps as containers, you should package your apps as [Docker](https://www.docker.com/) images and push those images to the [Azure Container Registry (ACR)](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro). I recommend Docker for all stateless apps and for local development (along with [Docker Compose](https://docs.docker.com/compose/)).
For running Docker containers in Azure I recommend using [Azure Kubernetes Service (AKS)](https://docs.microsoft.com/en-us/azure/aks/), which is a Azure's managed Kubernetes.
Another option is [Azure Container Instances (ACI)](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview), a service where Azure manages and scales the underlying VM Instances for you and you just hand it Docker containers to run. However, this is not recommended for scenarios where you need full container orchestration, including service discovery across multiple containers, automatic scaling, and coordinated application upgrades.
For running Docker containers in Azure I recommend using [Azure Container Registry](https://azure.microsoft.com/en-ca/services/container-registry/) and [Azure Kubernetes Service (AKS)](https://docs.microsoft.com/en-us/azure/aks/), which is a Azure's managed Kubernetes.
Another option is [Azure Container Instances (ACI)](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview), a service where Azure manages and scales the underlying VM Instances for you and you just hand it Docker containers to run. However, this is not recommended for scenarios where you need full container orchestration, including service discovery across multiple containers, automatic scaling, and coordinated application upgrades.
Package and deploy apps with Helm charts and push to [Azure Container Registry Helm Repos](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-helm-repos)
Helm is an open-source package manager which provides a large chart repository of applications. Charts can be customized and deployed to Kubernetes as part of an application stack.
Maximize the cost and performance of ACR with AKS by following [Best practices for Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-best-practices).
As images are deployed to an ACR and consumed from AKS, governance becomes important. Follow these guidelines to ensure performance, storage, registry organization and cost.
Build and run applications successfully in AKS by following [Best practices for Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/best-practices).
As a Cluster Operator, follow best practices of Multi-Tenancy, Security, Network and storage, and Disaster Recovery. As a Developer, ensure resource limits are implemented, logging is in place, and pods and containers are secure.
If you want to build serverless apps, I recommend you use [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). You can expose your Azure Functions as HTTP endpoints using [API Management](https://docs.microsoft.com/en-us/azure/api-management/import-function-app-as-api).
Serverless solutions have some common [Functions Best Practices](https://docs.microsoft.com/en-us/azure/azure-functions/functions-best-practices) to follow.
In addition to the practices for each function app's source language, follow the guidelines for performance, scalability, error handling and logging.
If you are running an App Service, or scaled Azure Functions, follow [Azure App Service Best Practices](https://docs.microsoft.com/en-us/azure/app-service/app-service-best-practices)
Manage resource limits, auto-healing, backup, and recovery.
Configure CPU settings, memory settings (e.g., -Xmx, -Xms settings for a JVM), and GC settings (if applicable) for your app. If you're deploying directly on VM Instances, these should be configured based on the available CPU and memory on your VM Instance (see [Instance Types](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes)). If you are deploying Docker containers, then tell the scheduler the [resources](https://docs.microsoft.com/en-us/azure/aks/developer-best-practices-resource-management) your app needs , and it will automatically try to find a VM Instance that has those resources.
Configure the [OS disk](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/managed-disks-overview#os-disk) on each VM Instance with enough space for your app and log files. For further data storage, attach one or more [Data disks](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/managed-disks-overview#data-disk).
Enable [encryption](https://docs.microsoft.com/en-us/azure/security/fundamentals/azure-disk-encryption-vms-vmss) on the OS and Data disks of each VM instance. Many Azure services optionally support encryption: e.g., see [Always Encrypted Azure SQL](https://docs.microsoft.com/en-us/azure/sql-database/sql-database-always-encrypted-azure-key-vault)
All VM instances should be in a private subnet and NOT accessible directly from the public Internet. Only a single, locked-down VM instance, known as the Bastion Host, should run in the public subnets. You must first connect to the Bastion Host, which gets you "in" to the network, and then you can use it as a "jump host" to connect to the other VM instances. I recommend using [Azure Bastion](https://docs.microsoft.com/en-us/azure/bastion/bastion-overview) which is a fully-managed PaaS service.
I typically recommend running a VPN Server as the entrypoint to your network. [OpenVPN](https://openvpn.net/) is the most popular option for running a VPN server. However, I would recommend using a PaaS option such as [VPN Gateway](https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-about-vpngateways). Alternatively, to extend your on-premises networks I would recommend using [ExpressRoute](https://docs.microsoft.com/en-us/azure/expressroute/expressroute-introduction).
**NEVER** store secrets in plaintext. Developers should store their secrets in a secure secrets manager, such as [pass](https://www.passwordstore.org/), [1Password](https://1password.com/), or [LastPass](https://www.lastpass.com/). Applications should store all their secrets (such as DB passwords and API keys) either in [secret variables](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=classic%2Cbatch#secret-variables) within a Azure DevOps variable group or in a secret store such as [Azure Vault](https://docs.microsoft.com/en-us/azure/key-vault/key-vault-overview) or [Hashicorp Vault](https://www.vaultproject.io/).
**NEVER** store secrets in plaintext. Developers should store their secrets in a secure secrets manager, such as [pass](https://www.passwordstore.org/), [1Password](https://1password.com/), or [LastPass](https://www.lastpass.com/). Applications should store all their secrets (such as DB passwords and API keys) either in [secret variables](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=classic%2Cbatch#secret-variables) within a Azure DevOps variable group or in a secret store such as [Azure Key Vault](https://docs.microsoft.com/en-us/azure/key-vault/key-vault-overview), .NET Core [Secret Manager](https://docs.microsoft.com/en-us/aspnet/core/security/app-secrets) or [Hashicorp Vault](https://www.vaultproject.io/).
Every server should be hardened to protect against attackers. This may include: running [CIS Hardened Images](https://www.cisecurity.org/cis-hardened-images/microsoft/), [unattended upgrades](https://docs.microsoft.com/en-us/azure/automation/automation-tutorial-update-management) to automatically install critical security patches, [firewall software](https://en.wikipedia.org/wiki/Firewall_(computing)), [anti-virus software](https://en.wikipedia.org/wiki/Antivirus_software), and [file integrity monitoring software](https://en.wikipedia.org/wiki/File_integrity_monitoring).
Browse through the [Top 10 Application Security Risks](https://www.owasp.org/index.php/Top_10-2017_Top_10) list from the [Open Web Application Security Project (OWASP)](https://www.owasp.org/index.php/Main_Page) and check your app for vulnerabilities such as injection attacks, CSRF, and XSS.
Review against the latest [CIS Microsoft Azure Foundations Benchmark](https://www.cisecurity.org/benchmark/azure/) document to check that any [Centre for Internet Security](https://www.cisecurity.org/) recommended security considerations have been made, to harden the environment against potential exploits.
Manage permissions for Active Directory users using [Active Directory Groups](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-groups-create-azure-portal). Follow the [Principle of Least Privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege), assigning the minimum permissions possible to each Active Directory Group and User.
Give your Active Directory Groups access to Azure resources by assigning [Roles (RBAC)](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview).
Set a [password policy](https://docs.microsoft.com/en-us/azure/active-directory-domain-services/password-policy) that requires a long password for all users and require every user to enable [Multi-Factor Authentication (MFA)](https://docs.microsoft.com/en-us/azure/active-directory/authentication/concept-mfa-howitworks).
Configure [audit logs](https://docs.microsoft.com/en-us/azure/security/fundamentals/log-audit) of all changes happening in your Azure subscription. I recommend [Azure Security Centre](https://docs.microsoft.com/en-us/azure/security-center/security-center-intro) to help manage this.
Configure [audit logs](https://docs.microsoft.com/en-us/azure/security/fundamentals/log-audit) of all changes happening in your Azure subscription. I recommend [Azure Security Centre](https://docs.microsoft.com/en-us/azure/security-center/security-center-intro) to help manage this. Azure Policy will recommend 365 days of retention by default. Also refer to OWASP [Logging Cheatsheet](https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html).
Configure [Azure Policy](https://docs.microsoft.com/en-us/azure/governance/policy/overview) to monitor for compliance across your Azure resources and enforce different rules and effects to meet your company requirements. Azure provide many example policy definitions you can use to get you started, in addition to these you can use the [Azure policy initiative definitions](https://docs.microsoft.com/en-us/azure/governance/policy/overview#initiative-definition). These initiatives group policies together to monitor and enforce policies for a common goal.
An example Initiative definition is **Audit VMs with insecure password security settings**. This initiative includes policies for password complexity, password re-use policy, password age policy, amongst others, all to achieve the goal of ensuring password security settings are correct.
Policies can also be deployed using [Azure Blueprints](https://docs.microsoft.com/en-us/azure/governance/blueprints/overview). This includes blueprint samples for [ISO 27001](https://docs.microsoft.com/en-us/azure/governance/blueprints/samples/iso27001/) and [CIS Microsoft Azure Foundations Benchmark](https://docs.microsoft.com/en-us/azure/governance/blueprints/samples/cis-azure-1.1.0/). Deploying your infrastructure and policies together using blueprints means you can deploy your solution in a way that meets your compliance needs using a trusted, repeatable process.
Metrics around what your application is doing, such as QPS, latency, and throughput. Useful tools: [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) which is part of [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview).
Metrics around what your hardware is doing, such as CPU, memory, and disk usage. Useful tools: [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) which is part of [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview).
Record events and stream data from all services. Slice and dice it using tools such as [Kafka](https://docs.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-introduction), [Honeycomb](https://www.honeycomb.io/), and of course [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/monitor-web-app-availability) which is part of [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview).
To prevent log files from taking up too much disk space, configure log rotation on every server. To be able to view and search all log data from a central location (i.e., a web UI), set up log aggregation using tools such as [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-sources-custom-logs), [Filebeat](https://www.elastic.co/products/beats/filebeat), [Logstash](https://www.elastic.co/products/logstash) etc.
To prevent log files from taking up too much disk space, configure log rotation on every server. To be able to view and search all log data from a central location (i.e., a web UI), set up log aggregation using tools such as [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-sources-custom-logs), [Filebeat](https://www.elastic.co/products/beats/filebeat), [Logstash](https://www.elastic.co/products/logstash) etc.
Configure alerts when critical metrics cross pre-defined thresholds, such as CPU usage getting too high or available disk space getting too low. Most of the metrics and log tools listed earlier in this section support alerting. Set up an on-call rotation using tools such as [PagerDuty](https://www.pagerduty.com/), [Opsgenie](https://www.opsgenie.com/) and [VictorOps](https://victorops.com/).
You can shut down VM instances when you're not using them, such as in your pre-prod environments at night and on weekends. You could even create an [Azure Automation](https://docs.microsoft.com/en-us/azure/automation/automation-solution-vm-management) solution that does this on a regular schedule.
Use [Scale Sets](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview) to increase the number of VM instances when load is high and then to decrease it again—and thereby save money—when load is low.
If you deploy everything as an directly on your VM instances, then you will typically run exactly one type of app per VM instance. If you use a Docker orchestration tool (e.g., [AKS](https://docs.microsoft.com/en-us/azure/aks/intro-kubernetes)), you can give it a cluster of VM instances to manage, and it will deploy Docker containers across the cluster as efficiently as possible, potentially running multiple apps on the same instances when resources are available.
For all short (5 min or less) background jobs, cron jobs, ETL jobs, event processing jobs, and other glue code, use [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). You not only have no servers to manage, but Azure Function pricing is incredibly cheap, with the first 1 million executions and 400,000 GB-seconds per month being completely free! After that, it's just £0.150 per million executions and £0.000012 for every GB-second.
For all short (5 min or less) background jobs, cron jobs, ETL jobs, event processing jobs, and other glue code, use [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview). You not only have no servers to manage, but Azure Functions [Consumption Plan pricing](https://azure.microsoft.com/en-ca/pricing/details/functions/) is incredibly cheap, with the first 1 million executions and 400,000 GB-seconds per month being completely free! After that, it's just £0.150 per million executions and £0.000012 for every GB-second.
If you have a lot of data in Azure Blob Storage, make sure to take advantage of [Azure Blob Lifecycle Management](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-portal) to save money. You can configure the Azure Blob to move files older than a certain age either to cheaper [storage tiers](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers) or to delete those files entirely.
Use [Azure Advisor](https://docs.microsoft.com/en-us/azure/advisor/advisor-cost-recommendations) to identify unused or underutilised Azure resources, such as old VM instances that no one is using any more.
Learn to use tools such as [Azure Advisor](https://docs.microsoft.com/en-us/azure/advisor/advisor-cost-recommendations), and [Cloudyn](https://docs.microsoft.com/en-us/azure/cost-management/overview) to understand where you're spending money. If you find something you can't explain, reach out to Azure Support, and they will help you track it down.
Create [alerts](https://docs.microsoft.com/en-us/azure/cost-management/cost-mgt-alerts-monitor-usage-spending) to notify you when your Azure bill crosses important thresholds. Make sure to have several levels of alerts: e.g., at the very least, one when the bill is a little high, one when it's really high, and one when it is approaching bankruptcy levels.
Turn on [Daily Cap](https://docs.microsoft.com/en-us/azure/azure-monitor/platform/manage-cost-storage) to limit the volume of data that the Log Analytics workspace will ingest per day.
Create [alerts](https://docs.microsoft.com/en-us/azure/cost-management/cost-mgt-alerts-monitor-usage-spending) to notify you when your Azure bill crosses important thresholds. Make sure to have several levels of alerts: e.g., at the very least, one when the bill is a little high, one when it's really high, and one when it is approaching bankruptcy levels.
Tag [resources](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/tag-resources) to assist with inventory, billing and logical organization. Recommended tags include department, environment, costCenter, productName, solutionName. Use Azure Policy to enforce tag standards.