Skip to content

data-platform-hq/terraform-databricks-databricks-runtime-premium

Repository files navigation

TODO - UPDATE DOCS

Databricks Premium Workspace Terraform module

Terraform module used for management of Databricks Premium Resources

Usage

Requires Workspace with "Premium" SKU

The main idea behind this module is to deploy resources for Databricks Workspace with Premium SKU only.

Here we provide some examples of how to provision it with a different options.

In example below, these features of given module would be covered:

  1. Workspace admins assignment, custom Workspace group creation, group assignments, group entitlements
  2. Clusters (i.e., for Unity Catalog and Shared Autoscaling)
  3. Workspace IP Access list creation
  4. ADLS Gen2 Mount
  5. Create Secret Scope and assign permissions to custom groups
  6. SQL Endpoint creation and configuration
  7. Create Cluster policy
  8. Create an Azure Key Vault-backed secret scope
  9. Connect to already existing Unity Catalog Metastore
# Prerequisite resources

# Databricks Workspace with Premium SKU
data "azurerm_databricks_workspace" "example" {
  name                = "example-workspace"
  resource_group_name = "example-rg"
}

# Databricks Provider configuration
provider "databricks" {
  alias                       = "main"
  host                        = data.azurerm_databricks_workspace.example.workspace_url
  azure_workspace_resource_id = data.azurerm_databricks_workspace.example.id
}

# Key Vault where Service Principal's secrets are stored. Used for mounting Storage Container
data "azurerm_key_vault" "example" {
  name                = "example-key-vault"
  resource_group_name = "example-rg"
}

# Example usage of module for Runtime Premium resources.
module "databricks_runtime_premium" {
  source  = "data-platform-hq/databricks-runtime-premium/databricks"

  project  = "datahq"
  env      = "example"
  location = "eastus"

  # Parameters of Service principal used for ADLS mount
  # Imports App ID and Secret of Service Principal from target Key Vault
  key_vault_id             =  data.azurerm_key_vault.example.id
  sp_client_id_secret_name = "sp-client-id" # secret's name that stores Service Principal App ID
  sp_key_secret_name       = "sp-key" # secret's name that stores Service Principal Secret Key
  tenant_id_secret_name    = "infra-arm-tenant-id" # secret's name that stores tenant id value

  # 1.1 Workspace admins 
  workspace_admins = {
    user = ["user1@example.com"]
    service_principal = ["example-app-id"]
  }

  # 1.2 Custom Workspace group with assignments.
  # In addition, provides an ability to create group and entitlements.
  iam = [{
    group_name = "DEVELOPERS"
    permissions  = ["ADMIN"]
    entitlements = [
      "allow_instance_pool_create",
      "allow_cluster_create",
      "databricks_sql_access"
    ] 
  }]

  # 2. Databricks clusters configuration, and assign permission to a custom group on clusters.
  databricks_cluster_configs = [ {
    cluster_name       = "Unity Catalog"
    data_security_mode = "USER_ISOLATION"
    availability       = "ON_DEMAND_AZURE"
    spot_bid_max_price = 1
    permissions        = [{ group_name = "DEVELOPERS", permission_level = "CAN_RESTART" }]
  },
  {
    cluster_name       = "shared autoscaling"
    data_security_mode = "NONE"
    availability       = "SPOT_AZURE"
    spot_bid_max_price = -1
    permissions        = [{group_name = "DEVELOPERS", permission_level = "CAN_MANAGE"}]
  }]

  # 3. Workspace could be accessed only from these IP Addresses:
  ip_rules = {
    "ip_range_1" = "10.128.0.0/16",
    "ip_range_2" = "10.33.0.0/16",
  }
  
  # 4. ADLS Gen2 Mount
  mountpoints = {
    storage_account_name = data.azurerm_storage_account.example.name
    container_name       = "example_container"
  }

  # 5. Create Secret Scope and assign permissions to custom groups 
  secret_scope = [{
    scope_name = "extra-scope"
    acl        = [{ principal = "DEVELOPERS", permission = "READ" }] # Only custom workspace group names are allowed. If left empty then only Workspace admins could access these keys
    secrets    = [{ key = "secret-name", string_value = "secret-value"}]
  }]

  # 6. SQL Warehouse Endpoint
  databricks_sql_endpoint = [{
    name        = "default"  
    enable_serverless_compute = true  
    permissions = [{ group_name = "DEVELOPERS", permission_level = "CAN_USE" },]
  }]

  # 7. Databricks cluster policies
  custom_cluster_policies = [{
    name     = "custom_policy_1",
    can_use  =  "DEVELOPERS", # custom workspace group name, that is allowed to use this policy
    definition = {
      "autoscale.max_workers": {
        "type": "range",
        "maxValue": 3,
        "defaultValue": 2
      },
    }
  }]

  # 8. Azure Key Vault-backed secret scope
  key_vault_secret_scope = [{
    name         = "external"
    key_vault_id = data.azurerm_key_vault.example.id
    dns_name     = data.azurerm_key_vault.example.vault_uri
  }]  
    
  providers = {
    databricks = databricks.main
  }
}

# 9 Assignment already existing Unity Catalog Metastore
module "metastore_assignment" {
  source  = "data-platform-hq/metastore-assignment/databricks"
  version = "1.0.0"

  workspace_id = data.azurerm_databricks_workspace.example.workspace_id
  metastore_id = "<uuid-of-metastore>"

  providers = {
    databricks = databricks.workspace
  }
}

Requirements

Name Version
terraform >=1.0.0
azurerm >= 4.0.1
databricks >=1.30.0

Providers

Name Version
azurerm >= 4.0.1
databricks >=1.30.0

Modules

No modules.

Resources

Name Type
azurerm_key_vault_access_policy.databricks resource
databricks_cluster.cluster resource
databricks_cluster_policy.overrides resource
databricks_cluster_policy.this resource
databricks_entitlements.this resource
databricks_group.this resource
databricks_group_member.admin resource
databricks_group_member.this resource
databricks_ip_access_list.this resource
databricks_mount.adls resource
databricks_permissions.clusters resource
databricks_permissions.sql_endpoint resource
databricks_permissions.this resource
databricks_secret.main resource
databricks_secret.this resource
databricks_secret_acl.external resource
databricks_secret_acl.this resource
databricks_secret_scope.external resource
databricks_secret_scope.main resource
databricks_secret_scope.this resource
databricks_service_principal.this resource
databricks_sql_endpoint.this resource
databricks_system_schema.this resource
databricks_token.pat resource
databricks_user.this resource
databricks_workspace_conf.this resource
databricks_group.account_groups data source
databricks_group.admin data source

Inputs

Name Description Type Default Required
clusters Set of objects with parameters to configure Databricks clusters and assign permissions to it for certain custom groups
set(object({
cluster_name = string
spark_version = optional(string, "13.3.x-scala2.12")
spark_conf = optional(map(any), {})
cluster_conf_passthrought = optional(bool, false)
spark_env_vars = optional(map(any), {})
data_security_mode = optional(string, "USER_ISOLATION")
node_type_id = optional(string, "Standard_D3_v2")
autotermination_minutes = optional(number, 30)
min_workers = optional(number, 1)
max_workers = optional(number, 2)
availability = optional(string, "ON_DEMAND_AZURE")
first_on_demand = optional(number, 0)
spot_bid_max_price = optional(number, 1)
cluster_log_conf_destination = optional(string, null)
init_scripts_workspace = optional(set(string), [])
init_scripts_volumes = optional(set(string), [])
init_scripts_dbfs = optional(set(string), [])
init_scripts_abfss = optional(set(string), [])
single_user_name = optional(string, null)
single_node_enable = optional(bool, false)
custom_tags = optional(map(string), {})
permissions = optional(set(object({
group_name = string
permission_level = string
})), [])
pypi_library_repository = optional(set(string), [])
maven_library_repository = optional(set(object({
coordinates = string
exclusions = set(string)
})), [])
}))
[] no
create_databricks_access_policy_to_key_vault Boolean flag to enable creation of Key Vault Access Policy for Databricks Global Service Principal. bool true no
custom_cluster_policies Provides an ability to create custom cluster policy, assign it to cluster and grant CAN_USE permissions on it to certain custom groups
name - name of custom cluster policy to create
can_use - list of string, where values are custom group names, there groups have to be created with Terraform;
definition - JSON document expressed in Databricks Policy Definition Language. No need to call 'jsonencode()' function on it when providing a value;
list(object({
name = string
can_use = list(string)
definition = any
}))
[
{
"can_use": null,
"definition": null,
"name": null
}
]
no
default_cluster_policies_override Provides an ability to override default cluster policy
name - name of cluster policy to override
family_id - family id of corresponding policy
definition - JSON document expressed in Databricks Policy Definition Language. No need to call 'jsonencode()' function on it when providing a value;
list(object({
name = string
family_id = string
definition = any
}))
[
{
"definition": null,
"family_id": null,
"name": null
}
]
no
global_databricks_sp_object_id Global 'AzureDatabricks' SP object id. Used to create Key Vault Access Policy for Secret Scope string "9b38785a-6e08-4087-a0c4-20634343f21f" no
iam_account_groups List of objects with group name and entitlements for this group
list(object({
group_name = optional(string)
entitlements = optional(list(string))
}))
[] no
iam_workspace_groups Used to create workspace group. Map of group name and its parameters, such as users and service principals added to the group. Also possible to configure group entitlements.
map(object({
user = optional(list(string))
service_principal = optional(list(string))
entitlements = optional(list(string))
}))
{} no
ip_rules Map of IP addresses permitted for access to DB map(string) {} no
key_vault_secret_scope Object with Azure Key Vault parameters required for creation of Azure-backed Databricks Secret scope
list(object({
name = string
key_vault_id = string
dns_name = string
tenant_id = string
}))
[] no
mount_adls_passthrough Boolean flag to use mount options for credentials passthrough. Should be used with mount_cluster_name, specified cluster should have option cluster_conf_passthrought == true bool false no
mount_cluster_name Name of the cluster that will be used during storage mounting. If mount_adls_passthrough == true, cluster should also have option cluster_conf_passthrought == true string null no
mount_enabled Boolean flag that determines whether mount point for storage account filesystem is created bool false no
mount_service_principal_client_id Application(client) Id of Service Principal used to perform storage account mounting string null no
mount_service_principal_secret Service Principal Secret used to perform storage account mounting string null no
mount_service_principal_tenant_id Service Principal tenant id used to perform storage account mounting string null no
mountpoints Mountpoints for databricks
map(object({
storage_account_name = string
container_name = string
}))
{} no
pat_token_lifetime_seconds The lifetime of the token, in seconds. If no lifetime is specified, the token remains valid indefinitely number 315569520 no
secret_scope Provides an ability to create custom Secret Scope, store secrets in it and assigning ACL for access management
scope_name - name of Secret Scope to create;
acl - list of objects, where 'principal' custom group name, this group is created in 'Premium' module; 'permission' is one of "READ", "WRITE", "MANAGE";
secrets - list of objects, where object's 'key' param is created key name and 'string_value' is a value for it;
list(object({
scope_name = string
acl = optional(list(object({
principal = string
permission = string
})))
secrets = optional(list(object({
key = string
string_value = string
})))
}))
[
{
"acl": null,
"scope_name": null,
"secrets": null
}
]
no
sql_endpoint Set of objects with parameters to configure SQL Endpoint and assign permissions to it for certain custom groups
set(object({
name = string
cluster_size = optional(string, "2X-Small")
min_num_clusters = optional(number, 0)
max_num_clusters = optional(number, 1)
auto_stop_mins = optional(string, "30")
enable_photon = optional(bool, false)
enable_serverless_compute = optional(bool, false)
spot_instance_policy = optional(string, "COST_OPTIMIZED")
warehouse_type = optional(string, "PRO")
permissions = optional(set(object({
group_name = string
permission_level = string
})), [])
}))
[] no
suffix Optional suffix that would be added to the end of resources names. string "" no
system_schemas Set of strings with all possible System Schema names set(string)
[
"access",
"billing",
"compute",
"marketplace",
"storage"
]
no
system_schemas_enabled System Schemas only works with assigned Unity Catalog Metastore. Boolean flag to enabled this feature bool false no
user_object_ids Map of AD usernames and corresponding object IDs map(string) {} no
workspace_admins Provide users or service principals to grant them Admin permissions in Workspace.
object({
user = list(string)
service_principal = list(string)
})
{
"service_principal": null,
"user": null
}
no

Outputs

Name Description
clusters Provides name and unique identifier for the clusters
sql_endpoint_data_source_id ID of the data source for this endpoint
sql_endpoint_jdbc_url JDBC connection string of SQL Endpoint
token Databricks Personal Authorization Token

License

Apache 2 Licensed. For more information please see LICENSE