Managing Azure Data Explorer using Terraform - Part 4: Setup Azure Monitor Ingestion
In this installment of the series, we’ll explore how to manage database schemas for Azure Monitor telemetry in Azure Data Explorer (ADX) using Terraform. Azure Monitor provides two types of telemetry: logs and metrics. Each has its own schema and ingestion process, which requires transforming raw data into structured, type-safe tables for optimal query performance. Leveraging ADX’s update policies, we can achieve this transformation seamlessly. This article walks you through creating tables, defining update policies, and using KQL functions to process Azure Monitor logs and metrics efficiently.
Understanding Azure Monitor Telemetry
Azure Monitor collects two types of telemetry: logs and metrics. Logs contain detailed information about operations, while metrics provide numerical data points for system monitoring. However, both types of data arrive as raw JSON blobs. Without transformation, this data remains unstructured, making queries cumbersome and performance suboptimal. ADX addresses this challenge through its update policy feature, which allows for automatic transformation of raw records into structured, type-safe tables.
An update policy specifies three components:
- A source table, where raw JSON data is ingested.
- A destination table, where the transformed data is stored.
- A function, which acts as a transformation query to process raw data and map it into the structured schema of the destination table.
This approach ensures that logs and metrics are queryable in a meaningful way, enhancing both usability and performance.
Setting Up the Schema for Metrics
Creating the Metrics Table
We start by defining the DiagnosticMetrics table in Terraform. This table will hold structured data derived from raw metrics ingested into ADX. Here is the schema definition:
resource "adx_table" "diagnostic_metrics" {
name = "DiagnosticMetrics"
database_name = var.database_name
column {
name = "Timestamp"
type = "datetime"
}
column {
name = "ResourceId"
type = "string"
}
column {
name = "MetricName"
type = "string"
}
column {
name = "Count"
type = "int"
}
column {
name = "Total"
type = "real"
}
column {
name = "Minimum"
type = "real"
}
column {
name = "Maximum"
type = "real"
}
column {
name = "Average"
type = "real"
}
column {
name = "TimeGrain"
type = "string"
}
}
This schema is designed to capture essential metrics properties, ensuring that the data is structured and ready for querying.
Defining the Expand Function for Metrics
The raw JSON metrics data needs to be transformed before insertion into the DiagnosticMetrics table. To achieve this, we define a KQL function in a separate .kql file and reference it in Terraform:
{DiagnosticRawRecords
| mv-expand events = Records
| where isnotempty(events.metricName)
| project Timestamp = todatetime(events['time']), ResourceId = tostring(events.resourceId), MetricName = tostring(events.metricName), Count = toint(events['count']), Total = todouble(events.total), Minimum = todouble(events.minimum), Maximum = todouble(events.maximum), Average = todouble(events.average), TimeGrain = tostring(events.timeGrain)}
In Terraform, we reference this KQL function using the file function:
resource "adx_function" "diagnostic_metrics_expand" {
database_name = var.database_name
name = "DiagnosticMetricsExpand"
body = file("${path.module}/files/diagnostic_metrics_expand.kql")
folder = local.update_policy_functions
}
Creating the Update Policy
Finally, we define an update policy that ties the raw metrics table, the transformation function, and the destination table together:
resource "adx_table_update_policy" "diagnostic_metrics" {
database_name = var.database_name
table_name = adx_table.diagnostic_metrics.name
query = adx_function.diagnostic_metrics_expand.name
source_table = adx_table.diagnostic_raw_records.name
enabled = true
transactional = true
}
Setting Up the Schema for Logs
Creating the Logs Table
For logs, we define a DiagnosticLogs table with the following schema:
resource "adx_table" "diagnostic_logs" {
name = "DiagnosticLogs"
database_name = var.database_name
column {
name = "Timestamp"
type = "datetime"
}
column {
name = "ResourceId"
type = "string"
}
column {
name = "OperationName"
type = "string"
}
column {
name = "Result"
type = "string"
}
column {
name = "OperationId"
type = "string"
}
column {
name = "Database"
type = "string"
}
column {
name = "Table"
type = "string"
}
column {
name = "IngestionSourceId"
type = "string"
}
column {
name = "IngestionSourcePath"
type = "string"
}
column {
name = "RootActivityId"
type = "string"
}
column {
name = "ErrorCode"
type = "string"
}
column {
name = "FailureStatus"
type = "string"
}
column {
name = "Details"
type = "string"
}
}
Defining the Expand Function for Logs
Similar to metrics, raw log data is transformed using a KQL function stored in a .kql file:
{DiagnosticRawRecords
| mv-expand events = Records
| where isnotempty(events.operationName)
| project Timestamp = todatetime(events['time']), ResourceId = tostring(events.resourceId), OperationName = tostring(events.operationName), Result = tostring(events.resultType), OperationId = tostring(events.properties.OperationId), Database = tostring(events.properties.Database), Table = tostring(events.properties.Table), IngestionSourceId = tostring(events.properties.IngestionSourceId), IngestionSourcePath = tostring(events.properties.IngestionSourcePath), RootActivityId = tostring(events.properties.RootActivityId),ErrorCode = tostring(events.properties.ErrorCode), FailureStatus = tostring(events.properties.FailureStatus), Details = tostring(events.properties.Details)}
We integrate this function into Terraform:
resource "adx_function" "diagnostic_logs_expand" {
database_name = var.database_name
name = "DiagnosticLogsExpand"
body = file("${path.module}/files/diagnostic_logs_expand.kql")
folder = local.update_policy_functions
}
Creating the Update Policy
The update policy for logs ensures that raw log records are transformed and stored in the DiagnosticLogs table:
resource "adx_table_update_policy" "diagnostic_logs" {
database_name = var.database_name
table_name = adx_table.diagnostic_logs.name
query = adx_function.diagnostic_logs_expand.name
source_table = adx_table.diagnostic_raw_records.name
enabled = true
transactional = true
}
Configuring Raw Records Table and Retention Policy
Raw Records Table Mapping
To handle the ingestion of raw JSON data, we define a DiagnosticRawRecords table with a single column of type dynamic. This column, named Records, stores raw JSON blobs:
resource "adx_table" "diagnostic_raw_records" {
name = "DiagnosticRawRecords"
database_name = var.database_name
column {
name = "Records"
type = "dynamic"
}
}
To map the raw JSON data into this table, we define a JSON mapping resource:
resource "adx_table_mapping" "diagnostic_raw_records" {
name = "DiagnosticRawRecordsMapping"
database_name = var.database_name
table_name = adx_table.diagnostic_raw_records.name
kind = "json"
mapping {
column = "Records"
path = "$.records"
}
}
This mapping ensures that incoming JSON records are correctly mapped into the Records column, enabling seamless ingestion and subsequent transformation through the update policies.
Retention Policy for Raw Records
Because the raw JSON data is only used as an intermediate step in the transformation process, we set a retention policy that deletes these records immediately after ingestion. This minimizes storage costs and ensures that only the transformed, structured data remains accessible:
resource adx_table_retention_policy diagnostic_logs_raw {
database_name = var.database_name
table_name = adx_table.diagnostic_raw_records.name
soft_delete_period = "0d"
recoverability = false
}
By setting the soft_delete_period to 0d, raw records are deleted as soon as they are ingested and transformed. This approach is efficient and aligns with the goal of preserving only structured, queryable data in the destination tables.
Simplifying KQL Management
Embedding KQL directly into Terraform code can make it difficult to maintain. By storing KQL queries in separate .kql files, you can edit and optimize them independently of your Terraform configurations. This separation is similar to techniques used when automating Grafana dashboards, where schemas are managed in their own files. With KQL, the simplicity of the syntax and fewer templating requirements make this approach even cleaner and more maintainable.
Conclusion
Using Terraform to manage database schemas for Azure Monitor telemetry in ADX is a powerful way to transform raw JSON data into structured, queryable tables. By leveraging update policies and externalizing KQL functions, you can ensure your infrastructure-as-code is both performant and maintainable. This method, through the power of Terraform, streamlines data ingestion and querying, providing a robust foundation for telemetry analysis in Azure Data Explorer.