The journey of provisioning Azure Databricks with Terraform often begins like any other Azure resource deployment: creating a resource group, a storage account, and a blob container. But soon, things start to diverge. Databricks brings unique challenges and nuances, particularly when it comes to permissions, access control, and the inherent regionality of the service. In this guide, I’ll walk through my experience creating an Azure Databricks environment using Terraform, including the pitfalls, decisions, and lessons learned along the way. Setting the Stage: The Basics

As with any Terraform project, the first step is laying the groundwork. This involves setting up a resource group, a storage account, and a container within the storage account. These foundational components are straightforward, and their configuration in Terraform should feel familiar to anyone who has worked with Azure resources.

However, things get interesting when introducing the azurerm_databricks_access_connector resource. This was my first time encountering this resource, and it serves as a bridge between Databricks and Azure’s other services. It’s particularly crucial for granting Databricks access to the Azure Storage Account.

Configuring the Databricks Access Connector

Here’s the configuration I used for the access connector:

resource "azurerm_databricks_access_connector" "main" {
  name                = "adbc-${var.application_name}-${var.environment_name}-${var.location}"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location

  identity {
    type = "SystemAssigned"
  }
}

The access connector is configured with a system-assigned managed identity, which allows it to authenticate securely within Azure. Although I typically prefer using user-assigned managed identities for greater control and reusability, I opted for system-assigned in this case to simplify the setup.

The next step involves granting this managed identity access to the storage account. Azure uses role-based access control (RBAC) for this purpose, and the azurerm_role_assignment resource comes into play here:

resource "azurerm_role_assignment" "databricks_connector" {
  scope                = azurerm_storage_account.main.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = azurerm_databricks_access_connector.main.identity[0].principal_id
}

This role assignment ensures that the Databricks access connector has the necessary permissions to interact with the blob container in the storage account. Without this step, the connector would fail to function as expected.

Creating the Azure Databricks Workspace

Once the access connector is configured and permissions are in place, it’s time to provision the main event: the Azure Databricks Workspace. This is the heart of any Databricks solution and the resource that enables analytics, data processing, and machine learning workloads.

Here’s the configuration for the workspace:

resource "azurerm_databricks_workspace" "main" {
  name                        = "adbw-${var.application_name}-${var.environment_name}-${var.location}"
  resource_group_name         = azurerm_resource_group.main.name
  location                    = azurerm_resource_group.main.location
  sku                         = "premium"
  managed_resource_group_name = "${azurerm_resource_group.main.name}-databricks"
}

This setup creates a workspace with the “premium” SKU, enabling advanced features like role-based access control, audit logs, and higher performance. Note the managed_resource_group_name attribute. Azure Databricks requires a separate, automatically managed resource group for its internal components. While you can specify the name of this resource group, Azure will manage its contents. Regional Considerations and Next Steps

One critical limitation of Azure Databricks is its regional nature. Each workspace operates within a single Azure region, meaning that achieving regional resiliency requires additional work. This involves setting up multiple workspaces in different regions and implementing a mechanism to synchronize or replicate data between them. In future parts of this series, I’ll explore these advanced configurations, including region-resilient Databricks solutions.

Conclusion

Provisioning Azure Databricks with Terraform is an exercise in both simplicity and complexity. The basic setup — resource group, storage account, and workspace — feels familiar, but introducing the Databricks access connector and managing permissions adds layers of intricacy. This first step lays the groundwork for deploying a functional Databricks environment, but it’s just the beginning. Next we’ll look at how to provision Databricks resources to this Azure Databricks workspace using the Databricks Terraform provider. Stay tuned for Part 2, where we’ll explore these topics in more detail.