How to Build & Scale an Infrastructure-as-Code Organization: Keep It Simple, Stupid
I start any Terraform project with a very simple, standard Terraform folder structure, which, to some, might appear rather Spartan. I don’t use fancy tools or abstractions. I don’t strive for technical elegance. I focus on meat and potatoes: readability, maintainability and infusing high cohesion in my solutions. I’m a firm believer that simplicity is key when it comes to Infrastructure-as-Code (IaC). The moment you introduce unnecessary complexity, you risk the entire system crumbling under its own weight. Therefore, I attempt to avoid unnecessary complexity at all costs. If you’re hoping to operate at scale in any meaningful way and avoid the trap of relying on a handful of highly paid consultants — the modern-day shamans who hold a monopoly on the will of the ‘infrastructure gods,’ interpreting the sacred code in ways only they can understand — heed my words: you’ve been warned.
Most of my career has been spent working in the enterprise, specifically with mid- to large-cap companies, often in the Fortune 500, 100, or even 50. These companies are typically vast, with human capital spread across the globe and often relying on system integration vendors — like the ones I worked with throughout my career before joining Microsoft. In my experience leading large, geographically distributed teams — usually with an 80/20 split between offshore (primarily in India) and onshore (in the U.S.) — simplicity was crucial to our success. Don’t get me wrong, we developed sophisticated software and advanced release processes to achieve minimal to no downtime, but we always followed the age-old adage: ‘Keep it Simple, Stupid.’
№1: That “DevOps” Guy
When you are the only one that knows something, its easy to become the go-to person, the guru, the SME. As one of those high powered consultants who often fell into this position with customers due to my technical acumen and depth of experience. I eventually would ultimately slide into becoming bottleneck. Every. Single. Time. I always tried to find ways to work myself out of it. You have to work at it. A lot. If you want to leave behind a sustainable organization powered by Infrastructure-as-Code, you have to remove yourselve as the bottleneck. This can’t be done alone. You need a team. You need an organization that can operate in one person’s absence. We don’t need one Superman, we need legion. As much as you need to actively empower others you also need to actively not enable them. If you let them, they will keep coming back to you to solve their problems — but that doesn’t get you (or them) out of this unsustainable situation.
There is definitely a cottage industry of these experts, and they are truly masters of their craft. Their expertise is invaluable, and we absolutely need people like them. But to help organizations succeed at scale, we also need to elevate their roles, empowering them to drive greater impact across the organization. Some of this responsibility lies with them — to pull their heads out of the weeds and rethink the critical path. Sometimes, the critical path isn’t about solving the hard problem; sometimes, it’s about removing yourself as a blocker. It’s about empowering others. However, some of the responsibility lies with all of us. It’s all too easy to pigeonhole someone as ‘the DevOps guy’ or ‘the Terraform guru.’ This is a double-edged sword — on one hand, we’re elevating them, putting them on a pedestal, but on the other, we’re dooming them to long hours and middle-of-the-night crises. At the same time, we are shedding any real responsibility we have of picking up the torch and helping carry the load. As the old adage goes, many shoulders make light work.
№2: The Revolving Door
Most senior technical leadership, like myself, were based in the U.S., while our developers, team leads, and project managers operated from 9.5 time zones away. Working with offshore teams in hot tech labor markets presents its own challenges, particularly turnover. In India, it’s not uncommon for engineers to ‘jump ship’ for what may seem like a marginal salary increase — sometimes as low as 5%. Compare this to the U.S., where most people wouldn’t make a move for less than a 20–30% raise. There’s also the extended 90-day notice period — unthinkable in the United States but quite common in the subcontinent — which, I believe, further contributes to team instability. This constant churn of skilled labor can create a revolving door of engineers, making it extremely difficult to cultivate niche skillsets or maintain bespoke architectures.
You need to accept the reality that there’s a permanent pressure on your labor: people will leave, and they will take their knowledge with them. You can never pay them enough to keep them forever. To build a sustainable team, you must develop a virtuous cycle — growing talent from within, from juniors to seniors, seniors to leads, and beyond. It requires disciplined investment to cultivate this cycle, or you’ll always be dependent on lone wolf heroes.
№3: Tool Chain Fragmentation
Here’s where the problem really sets in: too many tools. An overabundance of tools leads to a fragmented workforce, where few people truly understand the end-to-end workflow. Each new tool either adds a layer of abstraction or yet another cog of complexity in an ever expanding Rube Goldberg Machine they call a “system”. This creates a tremendous amount of conceptual overhead for freshers — not new to the industry or technical domain — but just new to the team. This creates a chasm between developers and the actual problems they’re solving.
Just like a frog will let itself be boiled alive if the temperature is gradually increased, if you’re not careful, the simplicity you started with slowly disappears, and your team gets bogged down managing tools (and the glue between them) instead of building infrastructure. Newcomers, thrown into this boiling pot, often try to ‘fix’ the problem but are often not empowered enough to make the complete change that is needed, leaving things worse by introducing well intentioned yet half-baked solutions.
The key to avoiding this is, again, to follow the ‘Keep it Simple, Stupid’ mantra. Use fewer tools, not more. Don’t reach for two tools when one will do. Avoid unnecessary abstractions and complexity. If you must add layers of abstraction, be thoughtful and cautious.
Always ask: what is this tool really solving? Does it provide real value, or is it just another layer that distances my team from the solution?
№4: Avoid Gold Plating
Infrastructure-as-Code is not binary. Your solution doesn’t have to be a zero or one — you can start small and evolve it over time. Use tools that allow for incremental growth, and avoid those that encourage your teams to ‘one-shot’ their way to victory.
Identify common, opinionated patterns instead of trying to solve every possible problem. Too often, I see Terraform modules that are essentially Swiss army knives — packed with every feature and attribute imaginable. While they follow some good module design principles by avoiding unnecessary abstraction, they go too far by attempting to solve all the world’s problems.
This makes the module’s behavior almost incomprehensible when configurations are changed. It can also result in an interface so complex that it grows into its own meta-language, adding a steep learning curve on top of the already convoluted toolchain that your developers are dealing with. Instead of simplifying their work, you’re making it more difficult. Please stop.
№5: Focus On Day 2
Another trap is treating Infrastructure-as-Code as simply a more expedient form of ClickOps — the proverbial ‘Cloud Shotgun’ approach, with the slogan: ‘Just shoot it up there and let me click the rest of my days.’ It’s like a bad country-western song, where everything goes wrong — the truck breaks down, the dog runs away, and the girlfriend leaves — and, ironically, this approach probably ends just as well for organizations hoping to achieve the benefits of Infrastructure-as-Code.
I get it, its fun when things are shiny and new. So its important to spend more time thinking about how you are going to operate the enironment after its up in the cloud. It’s like going furniature shopping and only thinking about what couch defines you as a human being — never once considering what it will look like in your living room, whether it will be comfortable for Netflix and Chill, or what happens when the inlaws spill that Kirkland Mertlot all over the cusions. Everybody loves a little retail therapy but this couch you gotta live with.
This means investing in the training and discipline necessary for your teams to move beyond the basics and truly implement a software development lifecycle around their infrastructure-as-code. Yes, this means some OG infrastructure folks might need to set up a GitHub account gasp and learn a few git commands — just enough to be dangerous (I promise). I jest, of course — there’s more to it than that, but upskilling is definitely required.
This cultural shift cannot be understated. There’s a world of difference between swivel-chairing something into the vSphere console and submitting a pull request, then going through the trials and tribulations of a thorough code review process.
Building an organization and culture that can scale — and, more importantly, survive beyond Day 2 — means embracing simplicity. A culture that hurls accolades on ‘Ubermensch Champions,’ combined with the challenges of cross-site collaboration, high turnover rates, toolchain complexity, and a focus on expediency over sustainable discipline, is a recipe for disaster. Your Infrastructure-as-Code solutions will become fragile, and your teams will be overwhelmed by the labyrinth of complexity. Start with a solid foundation. Simplify wherever you can, and ensure that what you build can be understood and maintained by far more than just ‘the chosen few.’