Managing Azure Data Explorer using Terraform - Part 1: Setup your Cluster
Azure Data Explorer (ADX), often referred to as Kusto, is one of the most popular and powerful services within Microsoft. Renowned for its speed and scalability, ADX is designed to handle large volumes of structured, semi-structured, and unstructured data, making it a go-to solution for logs and telemetry analysis. Many internal solutions at Microsoft rely on Kusto clusters for these purposes, but like any database, managing schema can be challenging. For Kusto, schema management goes beyond tables to include various data structures and permissions, requiring a nuanced approach.
In this article, we will explore how to provision and manage a Kusto cluster and its associated resources using Terraform. By the end, you’ll understand how to define clusters, databases, and permissions programmatically while aligning with best practices for Azure.
Provisioning the Kusto Cluster
To start using ADX, the first step is to provision a cluster. Kusto clusters are built to handle significant data ingestion and processing, making them ideal for enterprise-scale workloads. However, they are not inherently multi-region, so creating active-active solutions requires custom synchronization mechanisms or writing data multiple times to separate clusters. While a cross-region replication feature like CosmosDB would simplify this, it’s not available yet.
Here’s an example of how to define a Kusto cluster using Terraform:
resource "azurerm_kusto_cluster" "main" {
name = local.adx_name
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
sku {
name = var.adx_vm_size
capacity = var.adx_capacity
}
double_encryption_enabled = true
streaming_ingestion_enabled = true
engine = "V3"
auto_stop_enabled = false
}
This resource block sets up the cluster with essential configurations, including the SKU, capacity, and encryption settings. Once the cluster is provisioned, you can create databases and configure schema elements.
Setting Up a Kusto Database
A Kusto cluster can host multiple databases, each containing schema elements like tables and functions. The following Terraform resource demonstrates how to define a database within a Kusto cluster:
resource "azurerm_kusto_database" "central_database" {
name = "synthetics"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
cluster_name = azurerm_kusto_cluster.main.name
hot_cache_period = "P31D"
}
This resource specifies the database name, cluster association, and caching configurations. If you’re familiar with SQL Server or similar database systems, this structure should feel intuitive.
Managing Permissions for Data Plane Access
Unlike other Azure services like Key Vault and Storage, which use ARM-based role assignments for data plane access, ADX relies on its own authorization model. Permissions are tied to Entra ID (formerly Azure AD) identities, including users, groups, service principals, or managed identities. In Terraform, the azurerm_kusto_cluster_principal_assignment resource is used to manage these permissions.
Authorizing Terraform to Manage the Cluster
To allow Terraform to manage the cluster’s data plane, you need to assign the necessary permissions. Here’s how to authorize Terraform with the AllDatabasesAdmin role:
resource "azurerm_kusto_cluster_principal_assignment" "terraform_user" {
name = "terraform-user"
resource_group_name = azurerm_resource_group.main.name
cluster_name = azurerm_kusto_cluster.main.name
tenant_id = data.azurerm_client_config.current.tenant_id
principal_id = data.azurerm_client_config.current.client_id
principal_type = "App"
role = "AllDatabasesAdmin"
}
Adding Permissions for External Services
For use cases like integrating Azure Managed Grafana (AMG) to visualize telemetry data stored in ADX, you need to assign the appropriate permissions to Grafana’s managed identity. As a read-only workload, Grafana only requires the AllDatabasesViewer role:
resource "azurerm_kusto_cluster_principal_assignment" "grafana_viewer" {
name = "grafana-viewer"
resource_group_name = azurerm_resource_group.main.name
cluster_name = azurerm_kusto_cluster.main.name
tenant_id = data.azurerm_client_config.current.tenant_id
principal_id = azurerm_dashboard_grafana.main.identity.0.principal_id
principal_type = "App"
role = "AllDatabasesViewer"
}
Using Entra ID Groups for Human Access
For human users, it’s best practice to use Entra ID groups instead of assigning permissions directly to individuals. Groups make it easier to manage access at scale and provide a clean separation of concerns. For example:
resource "azurerm_kusto_cluster_principal_assignment" "admin_group" {
name = "admin-group"
resource_group_name = azurerm_resource_group.main.name
cluster_name = azurerm_kusto_cluster.main.name
tenant_id = data.azurerm_client_config.current.tenant_id
principal_id = var.adx_admin_group_id
principal_type = "Group"
role = "AllDatabasesAdmin"
}
resource "azurerm_kusto_cluster_principal_assignment" "reader_group" {
name = "reader-group"
resource_group_name = azurerm_resource_group.main.name
cluster_name = azurerm_kusto_cluster.main.name
tenant_id = data.azurerm_client_config.current.tenant_id
principal_id = var.adx_reader_group_id
principal_type = "Group"
role = "AllDatabasesViewer"
}
These assignments ensure administrators and readers have the appropriate levels of access without tying permissions to specific individuals.
Conclusion
Managing Azure Data Explorer clusters and their schemas with Terraform allows you to automate and scale your operations while adhering to best practices for security and maintainability. In this first part of the series, we covered how to provision a Kusto cluster, create a database, and manage permissions for Terraform, external services, and human users. Future installments will delve deeper into managing schema elements like tables, functions, and data ingestion pipelines. Stay tuned for more insights and practical examples as we continue to explore ADX with Terraform.