Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: databricks workspace module #1

Merged
merged 1 commit into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 74 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,82 @@
# Azure <> Terraform module
Terraform module for creation Azure <>
# AWS Databricks Workspace Terraform module
Terraform module for creation AWS Databricks Workspace

## Usage

<!-- BEGIN_TF_DOCS -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.8 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 5.0 |
| <a name="requirement_databricks"></a> [databricks](#requirement\_databricks) | >= 1.55 |
| <a name="requirement_time"></a> [time](#requirement\_time) | ~> 0.11 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 5.0 |
| <a name="provider_databricks"></a> [databricks](#provider\_databricks) | >= 1.55 |
| <a name="provider_time"></a> [time](#provider\_time) | ~> 0.11 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_iam_cross_account_workspace_policy"></a> [iam\_cross\_account\_workspace\_policy](#module\_iam\_cross\_account\_workspace\_policy) | terraform-aws-modules/iam/aws//modules/iam-policy | 5.41.0 |
| <a name="module_iam_cross_account_workspace_role"></a> [iam\_cross\_account\_workspace\_role](#module\_iam\_cross\_account\_workspace\_role) | terraform-aws-modules/iam/aws//modules/iam-assumable-role | 5.41.0 |
| <a name="module_privatelink_vpce"></a> [privatelink\_vpce](#module\_privatelink\_vpce) | ./modules/privatelink/ | n/a |
| <a name="module_storage_configuration_dbfs_bucket"></a> [storage\_configuration\_dbfs\_bucket](#module\_storage\_configuration\_dbfs\_bucket) | terraform-aws-modules/s3-bucket/aws | 4.1.2 |

## Resources

| Name | Type |
|------|------|
| [aws_s3_bucket_policy.databricks_aws_bucket_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_policy) | resource |
| [databricks_mws_credentials.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_credentials) | resource |
| [databricks_mws_networks.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_networks) | resource |
| [databricks_mws_private_access_settings.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_private_access_settings) | resource |
| [databricks_mws_storage_configurations.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_storage_configurations) | resource |
| [databricks_mws_workspaces.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_workspaces) | resource |
| [time_sleep.wait_30_seconds](https://registry.terraform.io/providers/hashicorp/time/latest/docs/resources/sleep) | resource |
| [databricks_aws_assume_role_policy.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/aws_assume_role_policy) | data source |
| [databricks_aws_bucket_policy.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/aws_bucket_policy) | data source |
| [databricks_aws_crossaccount_policy.this](https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/aws_crossaccount_policy) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_account_id"></a> [account\_id](#input\_account\_id) | Databricks Account ID | `string` | n/a | yes |
| <a name="input_iam_cross_account_workspace_role_config"></a> [iam\_cross\_account\_workspace\_role\_config](#input\_iam\_cross\_account\_workspace\_role\_config) | Configuration object for setting the IAM cross-account role for the Databricks workspace | <pre>object({<br/> role_name = optional(string, null)<br/> policy_name = optional(string, null)<br/> permission_boundary_arn = optional(string, null)<br/> role_description = optional(string, "Databricks IAM Role to launch clusters in your AWS account, you must create a cross-account IAM role that gives access to Databricks.")<br/> })</pre> | `{}` | no |
| <a name="input_iam_cross_account_workspace_role_enabled"></a> [iam\_cross\_account\_workspace\_role\_enabled](#input\_iam\_cross\_account\_workspace\_role\_enabled) | A boolean flag to determine if the cross-account IAM role for Databricks workspace access should be created | `bool` | `true` | no |
| <a name="input_label"></a> [label](#input\_label) | A customizable string used as a prefix for naming Databricks resources | `string` | n/a | yes |
| <a name="input_private_access_settings_config"></a> [private\_access\_settings\_config](#input\_private\_access\_settings\_config) | Configuration for private access settings | <pre>object({<br/> name = optional(string, null)<br/> allowed_vpc_endpoint_ids = optional(list(string), [])<br/> public_access_enabled = optional(bool, true)<br/> })</pre> | `{}` | no |
| <a name="input_private_access_settings_enabled"></a> [private\_access\_settings\_enabled](#input\_private\_access\_settings\_enabled) | Indicates whether private access settings should be enabled for the Databricks workspace. Set to true to activate these settings | `bool` | `true` | no |
| <a name="input_privatelink_dedicated_vpce_config"></a> [privatelink\_dedicated\_vpce\_config](#input\_privatelink\_dedicated\_vpce\_config) | Configuration object for AWS PrivateLink dedicated VPC Endpoints (VPCe) | <pre>object({<br/> rest_vpc_endpoint_name = optional(string, null)<br/> relay_vpc_endpoint_name = optional(string, null)<br/> rest_aws_vpc_endpoint_id = optional(string, null)<br/> relay_aws_vpc_endpoint_id = optional(string, null)<br/> })</pre> | `{}` | no |
| <a name="input_privatelink_dedicated_vpce_enabled"></a> [privatelink\_dedicated\_vpce\_enabled](#input\_privatelink\_dedicated\_vpce\_enabled) | Boolean flag to enable or disable the creation of dedicated AWS VPC Endpoints (VPCe) for Databricks PrivateLink | `bool` | `false` | no |
| <a name="input_privatelink_enabled"></a> [privatelink\_enabled](#input\_privatelink\_enabled) | Boolean flag to enabled registration of Privatelink VPC Endpoints (REST API and SCC Relay) in target Databricks Network Config | `bool` | `false` | no |
| <a name="input_privatelink_relay_vpce_id"></a> [privatelink\_relay\_vpce\_id](#input\_privatelink\_relay\_vpce\_id) | AWS VPC Endpoint ID used for Databricks SCC Relay when PrivateLink is enabled | `string` | `null` | no |
| <a name="input_privatelink_rest_vpce_id"></a> [privatelink\_rest\_vpce\_id](#input\_privatelink\_rest\_vpce\_id) | AWS VPC Endpoint ID used for Databricks REST API if PrivateLink is enabled | `string` | `null` | no |
| <a name="input_region"></a> [region](#input\_region) | AWS region | `string` | n/a | yes |
| <a name="input_security_group_ids"></a> [security\_group\_ids](#input\_security\_group\_ids) | Set of AWS security group IDs for Databricks Account network configuration | `set(string)` | n/a | yes |
| <a name="input_storage_dbfs_config"></a> [storage\_dbfs\_config](#input\_storage\_dbfs\_config) | Configuration for the Databricks File System (DBFS) storage | <pre>object({<br/> bucket_name = optional(string)<br/> })</pre> | `{}` | no |
| <a name="input_storage_dbfs_enabled"></a> [storage\_dbfs\_enabled](#input\_storage\_dbfs\_enabled) | Flag to enable or disable the use of DBFS (Databricks File System) storage in the Databricks workspace | `bool` | `true` | no |
| <a name="input_subnet_ids"></a> [subnet\_ids](#input\_subnet\_ids) | Set of AWS subnet IDs for Databricks Account network configuration | `set(string)` | n/a | yes |
| <a name="input_tags"></a> [tags](#input\_tags) | Assigned tags to AWS services | `map(string)` | `{}` | no |
| <a name="input_vpc_id"></a> [vpc\_id](#input\_vpc\_id) | AWS VPC ID | `string` | n/a | yes |
| <a name="input_workspace_creator_token_enabled"></a> [workspace\_creator\_token\_enabled](#input\_workspace\_creator\_token\_enabled) | Indicates whether to enable the creation of a token for workspace creators in Databricks | `bool` | `false` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_iam_role"></a> [iam\_role](#output\_iam\_role) | The IAM role created for cross-account access to the Databricks workspace |
| <a name="output_storage"></a> [storage](#output\_storage) | The storage configuration for the DBFS bucket associated with the workspace |
| <a name="output_workspace"></a> [workspace](#output\_workspace) | The Databricks workspace resource that has been created |
| <a name="output_workspace_url"></a> [workspace\_url](#output\_workspace\_url) | The URL for accessing the Databricks workspace |
<!-- END_TF_DOCS -->

## License
Expand Down
160 changes: 160 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
################################################################################
# Databricks Workspace
################################################################################
resource "databricks_mws_workspaces" "this" {
account_id = var.account_id
aws_region = var.region
workspace_name = var.label
credentials_id = databricks_mws_credentials.this.credentials_id
storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
network_id = databricks_mws_networks.this.network_id
private_access_settings_id = try(databricks_mws_private_access_settings.this[0].private_access_settings_id, null)

dynamic "token" {
for_each = var.workspace_creator_token_enabled ? [1] : []
content {
comment = "Workspace creator token managed by Terraform"
}
}

lifecycle {
replace_triggered_by = [databricks_mws_credentials.this]
}

}

resource "databricks_mws_private_access_settings" "this" {
count = var.private_access_settings_enabled ? 1 : 0

private_access_settings_name = coalesce(var.private_access_settings_config.name, var.label)
region = var.region
public_access_enabled = var.private_access_settings_config.public_access_enabled
allowed_vpc_endpoint_ids = coalesce(var.private_access_settings_config.allowed_vpc_endpoint_ids, [var.privatelink_rest_vpce_id])
private_access_level = "ENDPOINT"
}

################################################################################
# Network
################################################################################
resource "databricks_mws_networks" "this" {
account_id = var.account_id
network_name = var.label
security_group_ids = var.security_group_ids
subnet_ids = var.subnet_ids
vpc_id = var.vpc_id

dynamic "vpc_endpoints" {
for_each = var.privatelink_enabled ? [1] : []
content {
dataplane_relay = [coalesce(try(module.privatelink_vpce.relay_vpce_id, null), var.privatelink_relay_vpce_id)]
rest_api = [coalesce(try(module.privatelink_vpce.rest_vpce_id, null), var.privatelink_rest_vpce_id)]
}
}
}

################################################################################
# Privatelink dedicated VPC Endpoints (REST/Relay)
################################################################################
module "privatelink_vpce" {
count = var.privatelink_dedicated_vpce_enabled ? 1 : 0
source = "./modules/privatelink/"

account_id = var.account_id
region = var.region
relay_vpc_endpoint_name = var.privatelink_dedicated_vpce_config.relay_vpc_endpoint_name
relay_aws_vpc_endpoint_id = var.privatelink_dedicated_vpce_config.relay_aws_vpc_endpoint_id
rest_vpc_endpoint_name = var.privatelink_dedicated_vpce_config.rest_vpc_endpoint_name
rest_aws_vpc_endpoint_id = var.privatelink_dedicated_vpce_config.rest_aws_vpc_endpoint_id
}

################################################################################
# IAM
################################################################################
data "databricks_aws_assume_role_policy" "this" {
external_id = var.account_id
}

data "databricks_aws_crossaccount_policy" "this" {}

module "iam_cross_account_workspace_policy" {
source = "terraform-aws-modules/iam/aws//modules/iam-policy"
version = "5.41.0"

name = coalesce(var.iam_cross_account_workspace_role_config.policy_name, "${var.label}-dbx-crossaccount-policy")
policy = data.databricks_aws_crossaccount_policy.this.json
}

module "iam_cross_account_workspace_role" {
count = var.iam_cross_account_workspace_role_enabled ? 1 : 0
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
version = "5.41.0"

role_name = coalesce(var.iam_cross_account_workspace_role_config.role_name, "${var.label}-dbx-cross-account")
create_role = var.iam_cross_account_workspace_role_enabled
create_custom_role_trust_policy = true
custom_role_trust_policy = data.databricks_aws_assume_role_policy.this.json
role_permissions_boundary_arn = var.iam_cross_account_workspace_role_config.permission_boundary_arn
role_description = var.iam_cross_account_workspace_role_config.role_description
custom_role_policy_arns = [module.iam_cross_account_workspace_policy.arn]
tags = var.tags
}

# It is required to wait up to 30 seconds after role creation so Databricks would successfuly reference it
resource "time_sleep" "wait_30_seconds" {
depends_on = [module.iam_cross_account_workspace_role]

create_duration = "30s"
}

resource "databricks_mws_credentials" "this" {
account_id = var.account_id
credentials_name = "${var.label}-credentials"
role_arn = module.iam_cross_account_workspace_role[0].iam_role_arn

depends_on = [time_sleep.wait_30_seconds]
}

################################################################################
# Storage Configuration
################################################################################
data "databricks_aws_bucket_policy" "this" {
bucket = module.storage_configuration_dbfs_bucket[0].s3_bucket_id
}

module "storage_configuration_dbfs_bucket" {

Check warning on line 124 in main.tf

View workflow job for this annotation

GitHub Actions / Run security KICS scaner

[HIGH] S3 Bucket Allows Public Policy

S3 bucket allows public policy

Check warning on line 124 in main.tf

View workflow job for this annotation

GitHub Actions / Run security KICS scaner

[HIGH] S3 Bucket Without Restriction Of Public Bucket

S3 bucket without restriction of public bucket

Check warning on line 124 in main.tf

View workflow job for this annotation

GitHub Actions / Run security KICS scaner

[MEDIUM] S3 Bucket Allows Public ACL

S3 bucket allows public ACL

Check warning on line 124 in main.tf

View workflow job for this annotation

GitHub Actions / Run security KICS scaner

[MEDIUM] S3 Bucket Logging Disabled

Server Access Logging should be enabled on S3 Buckets so that all changes are logged and trackable
count = var.storage_dbfs_enabled ? 1 : 0
source = "terraform-aws-modules/s3-bucket/aws"
version = "4.1.2"

bucket_prefix = coalesce(var.storage_dbfs_config.bucket_name, "${var.label}-dbfs-")
acl = "private"

force_destroy = true

control_object_ownership = true
object_ownership = "BucketOwnerPreferred"

server_side_encryption_configuration = {
rule = {
apply_server_side_encryption_by_default = {
sse_algorithm = "AES256"
}
}
}

versioning = {

Check warning on line 145 in main.tf

View workflow job for this annotation

GitHub Actions / Run security KICS scaner

[HIGH] S3 Bucket Without Enabled MFA Delete

S3 bucket without MFA Delete Enabled. MFA delete cannot be enabled through Terraform, it can be done by adding a MFA device (https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa_enable.html) and enabling versioning and MFA delete by using AWS CLI: 'aws s3api put-bucket-versioning --versioning-configuration=Status=Enabled,MFADelete=Enabled --bucket=<BUCKET_NAME> --mfa=<MFA_SERIAL_NUMBER>'. Please, also notice that MFA delete can not be used with lifecycle configurations

Check warning on line 145 in main.tf

View workflow job for this annotation

GitHub Actions / Run security KICS scaner

[HIGH] S3 Bucket Without Enabled MFA Delete

S3 bucket without MFA Delete Enabled. MFA delete cannot be enabled through Terraform, it can be done by adding a MFA device (https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa_enable.html) and enabling versioning and MFA delete by using AWS CLI: 'aws s3api put-bucket-versioning --versioning-configuration=Status=Enabled,MFADelete=Enabled --bucket=<BUCKET_NAME> --mfa=<MFA_SERIAL_NUMBER>'. Please, also notice that MFA delete can not be used with lifecycle configurations

Check warning on line 145 in main.tf

View workflow job for this annotation

GitHub Actions / Run security KICS scaner

[MEDIUM] S3 Bucket Without Versioning

S3 bucket should have versioning enabled
status = "Disabled"
}

}

resource "aws_s3_bucket_policy" "databricks_aws_bucket_policy" {
bucket = module.storage_configuration_dbfs_bucket[0].s3_bucket_id
policy = data.databricks_aws_bucket_policy.this.json
}

resource "databricks_mws_storage_configurations" "this" {
account_id = var.account_id
storage_configuration_name = var.label
bucket_name = module.storage_configuration_dbfs_bucket[0].s3_bucket_id
}
13 changes: 13 additions & 0 deletions modules/privatelink/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
resource "databricks_mws_vpc_endpoint" "rest" {
account_id = var.account_id
aws_vpc_endpoint_id = var.rest_aws_vpc_endpoint_id
vpc_endpoint_name = var.rest_vpc_endpoint_name
region = var.region
}

resource "databricks_mws_vpc_endpoint" "relay" {
account_id = var.account_id
aws_vpc_endpoint_id = var.relay_aws_vpc_endpoint_id
vpc_endpoint_name = var.relay_vpc_endpoint_name
region = var.region
}
9 changes: 9 additions & 0 deletions modules/privatelink/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
output "rest_vpce_id" {
value = databricks_mws_vpc_endpoint.rest.vpc_endpoint_id
description = "The ID of the AWS VPC endpoint associated with the Databricks REST API"
}

output "relay_vpce_id" {
value = databricks_mws_vpc_endpoint.relay.vpc_endpoint_id
description = "The ID of the AWS VPC endpoint associated with the Databricks Relay service"
}
28 changes: 28 additions & 0 deletions modules/privatelink/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
variable "region" {
type = string
description = "AWS region"
}

variable "rest_vpc_endpoint_name" {
type = string
description = "The name to assign to the AWS VPC endpoint for the Databricks REST API"
}
variable "rest_aws_vpc_endpoint_id" {
type = string
description = "The AWS VPC endpoint ID for the Databricks REST API"
}

variable "relay_vpc_endpoint_name" {
type = string
description = "The name to assign to the AWS VPC endpoint for the Databricks Relay service"
}

variable "relay_aws_vpc_endpoint_id" {
type = string
description = "The AWS VPC endpoint ID for the Databricks Relay service"
}

variable "account_id" {
type = string
description = "Databricks Account ID"
}
10 changes: 10 additions & 0 deletions modules/privatelink/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
terraform {
required_version = ">= 1.0"

required_providers {
databricks = {
source = "databricks/databricks"
version = ">= 1.55"
}
}
}
19 changes: 19 additions & 0 deletions outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
output "workspace" {
value = databricks_mws_workspaces.this
description = "The Databricks workspace resource that has been created"
}

output "storage" {
value = try(module.storage_configuration_dbfs_bucket[0], null)
description = "The storage configuration for the DBFS bucket associated with the workspace"
}

output "iam_role" {
value = try(module.iam_cross_account_workspace_role[0], null)
description = "The IAM role created for cross-account access to the Databricks workspace"
}

output "workspace_url" {
value = databricks_mws_workspaces.this.workspace_url
description = "The URL for accessing the Databricks workspace"
}
Loading
Loading