Should the Azure Terraform Provider “azurerm” Support Multi-Subscription Deployments?

Each Terraform provider is, in its own right, an independent piece of software. Sure, it has contracts that allow it to be integrated with the Terraform core executable but it lives in its own GitHub repository, authored and maintained by different people, and is free to make it’s own design choices about how it will work. This flexibility in the hands of each Terraform provider author has enabled a profound tenacity that allows Terraform to automate just about anything, from a plethora of public cloud hyperscalers, to networking devices such as firewalls and switches, to dropping Diamond Blocks on a Minecraft server and ordering a Pizza from Domino’s.

Even when narrowing or focus to only looking at how this creative license has played out within a single category of Terraform providers — arguably the most important category — public cloud hyperscalers (AWS, Azure, and GCP) we see significant diversity. Much of this diversity is of course in the differences of cloud services themselves: how things are named, how things are structured, how services tie together. But the biggest difference between these three providers and the first difference the Terraformers of the world will encounter is the provider block.

On AWS the provider is scoped by two constraints: an AWS account and an AWS region. One is organizational and one is geographical. Since the provider’s authentication with AWS is through an IAM identity that is inherently scoped to an AWS Account, the AWS account you are targeting is implicit by the identity you choose to use.

IMAGE

The AWS provider unapologetically states “The Region must be set”. Somehow, somewhere, you are going to specify what AWS region this “aws” Terraform provider block is going to be scoped to.

IMAGE

This means that AWS Terraformers need to use multiple provider blocks to do multi-region deployments. This can be achieved by declaring multiple “aws” provider blocks and specifying the “alias” attribute and correlating a new alias each with a desired region.

provider "aws" {
  region = "us-east-1"
  alias = "primary"
}
provider "aws" {
  region = "us-west-1"
  alias = "secondary"
}

Multi-Region deployments are quite common within the responsibility sphere for a single team, therefore this scenario is a reality for many teams operating large solutions on AWS. Because of this design decision, AWS Terraformers, need to deal with multiple provider blocks and provider aliasing which adds complexity to their Terraform configurations and reusable modules.

As we all know, hindsight is 20/20. So when Microsoft started developing it’s provider, I suspect the authors, who very likely heard and even lived the pain of the region-scoped AWS provider wanted to avoid this in the Azure provider. Therefore, when using the Microsoft Azure Terraform provider (i.e., “azurerm”) you only have one constraint: the Azure Subscription.

This means that Azure’s Terraform provider consciously chose to scope only by organizational means — not by geographical. As a result, provisioning multi-region deployments with the Azure Terraform provider requires only a single “azurerm” provider block. One block. No aliasing. No fuss.

This has a ripple effect through all of the “azurerm” provider’s resources. Since Azure does not scope its provider to an Azure region in the same way AWS does, that means almost every top-level resource within Azure has an attribute specifying what region it’s in. Any Azure Terraformer is surely to encounter the location attribute.

So when would an Azure Terraformer need to use multiple provider blocks? When one needed to provision across multiple Azure Subscriptions! Why would anyone every want to do this? Well, if I want to deploy something that spans an organizational boundary. Azure Subscriptions, like AWS Accounts, are used as a security boundary and for financial and policy segmentation as well. Large enterprises often have “uber-teams” that have a cross cutting responsibility across teams (and across subscriptions). Hence, they may need to provision things across the subscriptions of all the teams.

provider "azurerm" {
  subscription_id = "00000000-0000-0000-0000-000000000000"
  alias = "team1"
}
provider "aws" {
  subscription_id = "00000000-0000-0000-0000-000000000001"
  alias = "team2"
}

In the above code you can see this happening with two Subscriptions, each owned by a different team. Not too bad, eh? But what if I work at a really big company and wanna deploy the same thing to a large number of my subscriptions, say 100 of them. 100 Alias providers? What about 1000 subscriptions? 1000 alias providers? Yikes.

You can see where this will become unmanageable. A good way to tell if a solution will scale is to multiple the number of times you do it by a very large number. The alias provider works great on a small scale but becomes extremely cumbersome after…let’s say 3ish… 🤓

On Google Cloud, they took things to the next level. Like the other two providers we looked at, Google Cloud has both organizational and geographical boundaries.

Google’s organizational boundary is called a “Project”. A Project is a little bit like an Azure Subscription and a little bit like an Azure Resource Group. On one hand, a Google Project is very much a logical boundary around resources provisioned to Google Cloud — making it very much like an Azure Resource Group. On the other hand, it is also the primary security boundary and place where financial and policy-based segmentation occur — making it very much like an Azure Subscription.

Google offers not one, but two geographical boundaries: a region and a zone. Allowing you to scope all the way down to a single Zone if you wanted to!

OK, AWS and Azure Terraformers, this is where its gonna get weird. Hold onto your butts. The really different thing about Google’s Terraform provider is that all three of these boundaries are completely optional! Yes. Optional. You can setup a “google” provider without any scope whatsoever. This means you can provision across-regions, you can provision-across projects, you can provision across projects and across regions at the same time!!! As General McAuliffe would say, “Nuts!”.

IMAGE

Just like Azure’s design decision of only scoping to the Subscription had a ripple effect of placing all those required location attributes on every resource. This design decision had a ripple effect of adding (completely optional, mind you) attributes for project_id, region and zone.

Don’t specify project_id on your resources? No problem! But you better have scoped the provider you’re using with a default Project. Don’t specify region on your resources? No problem! But you better have scoped the provider you’re using with a default Region. You get the idea. Truly liberating. Or is it?

IMAGE

So should Azure follow Google Cloud’s example and make even the organizational boundary optional? Doing so would certainly remove the difficulty of provisioning across Azure Subscriptions.

Who would want to do this you might ask. It’s very niche admittedly but there is a realm in large enterptise where there are teams that manage cross cutting infrastructure across a large organization. Today they have a few choices (let’s use the number 100 for the subscriptions count to keep things simple).

100 Alias providers in one root module with one pipeline, run once to update
100 root modules each with their own pipeline, run 100 to update
ARM template provisioned by Azure Policy to a MG, with one pipeline, run once to update

Two of these options suck and the other doesnt use Terraform. (Well I guess they could at least provision the Azure Policy with Terraform) so now we’re back to managing ARM templates again.

If we could just add a subscription attribute to any resource in the Azurerm provider (like we can add a project_id in GCP for example, where GCP has a similar structure to a subscription called a Project) then option 1 becomes trivial. We simply no longer have the huge PITA of coding around 100 bloody provider blocks and 100 provider aliases.

To be clear, this would be an absolute anti-pattern in most cases. An argument could be made (Mitchell made a good case for this philosophy during his recent interview on the IaC podcast) that the constraints that exist today in the Azurerm provider (scoped to one subscription, want more? go add a provider alias) are a good thing because they make this anti-pattern difficult to achieve so the 99% of Azure Terraformers won’t even try it because when they start down this path it will feel…icky…cumbersome…and they will sensibly abandon it and rethink the problem.

This is probably why the provider was designed this way in the first place and why it probably won’t change….and maybe that’s good thing….and maybe did I just talk myself out of my original idea?