How to Bootstrap Hundreds of AWS accounts with OpenTofu

When working with AWS at organisational scale, top-level governance and security controls - such as delegated administrators and service control policies (SCPs) - are only half the problem. Every account within that organisation must be consistently bootstrapped, too. For example, administrative services like Security Hub simply won't work without establishing AWS Config recorders at the individual account/region level. The solution to this problem is often to leverage AWS Control Tower Account Factory, or CloudFormation StackSets. However, not every team uses, or is comfortable with, CloudFormation (which both of the above use). Many teams use Terraform instead, and may be reluctant to introduce a new technology into their workflow purely for account bootstrapping.

That said, Terraform has always been infamously awkward for multi-account/region deployments, due to the static nature of provider blocks. Thus, many teams have resorted to using tools like Account Factory for Terraform (AFT) to gear Control Tower/Account Factory towards their tech stack - but once again, under the hood lies CloudFormation. So how do teams who are already using Terraform for infrastructure management bootstrap their accounts in a consistent manner, without altering their workflow?

Enter OpenTofu, a community-driven fork of Terraform, which as of version 1.9.0, solves the hard-coded provider problem by allowing them to be instantiated based upon a variablised loop. In this article, we will explore how this works by deploying an OpenTofu module declaring baseline infrastructure across an organisation.

Prerequisites

If you wanted to achieve something similar, you would need something like the following:

Intermediate knowledge of Terraform and AWS.
An AWS Organisation with at least 3 accounts, including an account from which you have permission to assume an administrative role in the other accounts. For demo purposes, you can use your organisation's management account and assume the OrganizationAccountAccessRole, but this is not recommended in a live environment.
The AWS CLI installed and configured with a profile that grants permission to do the above.
OpenTofu >=1.9.0 installed. You may wish to use tofuenv.

The Problem

Let's say we wanted to deploy some infrastructure to 4 regions, using Terraform. Let's say you have a "home region", which acts as your hub for all other regions, e.g., for centralised data storage. Your providers.tf file might look something like this:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.97.0"
    }
  }
}

provider "aws" {
  alias    = "home-region"
  region   = var.home_region
}

provider "aws" {
  alias    = "eu-west-1"
  region   = "eu-west-1"
}

provider "aws" {
  alias    = "us-east-1"
  region   = "us-east-1"
}

provider "aws" {
  alias    = "us-east-2"
  region   = "us-east-2"
}

That's not so bad - it gets a bit more cumbersome if you want to add default_tags or other further configurations, but seemingly not awful to manage. However, if you wanted to deploy the same module to each of these regions, you'd need to declare your module 4 times, too:

module "home_region_baseline" {
  source = "<your-module>"
  providers = {
    aws = aws.home-region
  }
  home_region = var.home_region
}

module "eu_west_2_baseline" {
  source   = "<your-module>"
  providers = {
    aws = aws.eu-west-2
  }
  home_region = var.home_region
  existing_bucket_config = {
    s3_bucket_name = module.home_region_baseline.bucket.id
    s3_kms_key_arn = module.home_region_baseline.bucket_key.arn
  }
}

module "us_east_1_baseline" {
  source   = "<your-module>"
  providers = {
    aws = aws.us-east-1
  }
  home_region = var.home_region
  existing_bucket_config = {
    s3_bucket_name = module.home_region_baseline.bucket.id
    s3_kms_key_arn = module.home_region_baseline.bucket_key.arn
  }
}

module "us_east_2_baseline" {
  source   = "<your-module>"
  providers = {
    aws = aws.us-east-2
  }
  home_region = var.home_region
  existing_bucket_config = {
    s3_bucket_name = module.home_region_baseline.bucket.id
    s3_kms_key_arn = module.home_region_baseline.bucket_key.arn
  }
}

The above is relatively simple - your home_region_baseline module is instantiating an S3 bucket and KMS key, which are used by the modules provisioned in other regions. But if these modules were a bit more complex, or you wanted to scale to more regions, you're looking at a lot of configuration, a lot of repetition, and a lot of room for error. That's because you can't use a for_each loop to iterate over providers, and by extension, the modules that use them.

The Solution

OpenTofu 1.9.0 introduced the ability to add for_each meta-arguments to provider blocks, in the same way as you might do for provisioning a dynamic set of resources. For example:

provider "aws" {
  alias    = "by_region"
  for_each = var.regions
  region   = each.value.region
}

The rule is that as long as the value used in your for_each block is known at plan-time. So this means:

You can use variables passed into your module.
You can use locals declared in the module.
You cannot use data sources.

So your modules can now look like this:

module "home_region_baseline" {
  source = "<your-module>"
  providers = {
    aws = aws.by_region[var.home_region]
  }
  home_region = var.home_region
}

module "non_home_region_baseline" {
  source   = "<your-module>"
  # Every region besides the home region
  for_each = setsubtract(var.resource_regions, [var.home_region])
  providers = {
    aws = aws.by_region[each.key]
  }
  home_region = var.home_region
  existing_delivery_config = {
    s3_bucket_name = module.home_region_baseline.bucket.id
    s3_kms_key_arn = module.home_region_baseline.bucket_key.arn
  }
}

No matter how many regions you add to your regions/resource_regions (we'll discuss why these are two different variables in the Caveats section), you'll never have to rewrite either your provider or module configurations. There's both a home_region_baseline and a non_home_region_baseline in the above because the latter depends on the former's outputs - something you couldn't do if they shared the same module declaration.

Caveats

As of the time of writing, this feature is very much still being developed. But it's worth discussing some of its current limitations.

Variable redundancy

You may have noticed that the variable used in the for_each for the provider block is different to the variable used in the for_each for the module declarations. That might seem odd, given that they're based on the same list of regions. This is because the variable used for the parent provider cannot be the same as the variable used for the child resources of that provider - at least if you want a clean apply. If you choose to use a single variable, you will have to use the -target flag if you want to destroy your resources later on. You may prefer this - just know that this doesn't lend itself to a CI/CD setting quite as nicely.

Blast radius

With great power comes great responsibility. You could technically use this feature on whatever scale you liked (see Taking it further), but you have to weigh that against the risks you're willing to work with. For a greenfield AWS organisation (hence why the example used has been around an "account baseline"), or a large but non-critical service architecture, this could be a game-changer. Perhaps there are clever ways you can use time_sleep or local-exec-based health checks that might offer a means of staggering your resource deployments to offer a layer of safety/break-glass - but at that point many would argue that responsibility should fall with your CI/CD system, not your IaC tool.

Complexity

Abstract and dynamic code is innately more complex than repeated code. That's generally not a valid argument against DRYing up your code (plus, tofu plan is your friend), but it's certainly something to consider when blast radius is already a concern.

Taking it further

Above, we took a use case that applied a single account with multiple regions - one of which acted as a "centre of operations" - but we could be more ambitious with this. Let's say we have a few accounts which we can access via the OrganizationAccountAccessRole, and we'd like to apply this infrastructure across a few regions. Rather than just passing in a list of regions as a variable, we can pass a map of objects:

variable "assume_roles_to_use" {
  type = map(object({
    regions      = set(string)
    home_region  = string
    role_name    = optional(string, "OrganizationAccountAccessRole")
    external_id  = optional(string, null)
    session_name = optional(string, "terraform-apply_account-baseline")
    {extra configs for each account here}
  }))
}

variable "enabled_assume_roles" {
  type = map(object({
    regions      = set(string)
    home_region  = string
    role_name    = optional(string, "OrganizationAccountAccessRole")
    external_id  = optional(string, null)
    session_name = optional(string, "terraform-apply_account-baseline")
  }))
}

This works because, like a for_each placed on a resource/module, as long as the collection is keyable, it can be used.

Now we can flatten the variables passed in to produce a matrix of every possible combination of account and region that has been required:

// Flatten the map of regions per account into a matrix of accounts/regions
enabled_account_regions = { for account_region in flatten([ for k, v in var.enabled_assume_roles : [
  for region in v.regions : merge(v, {
    account_number = k
    region = region
  })
]]) : format("%s-%s", account_region.account_number, account_region.region) => account_region }

account_regions_to_use = { for account_region in flatten([ for k, v in var.assume_roles_to_use : [
  for region in v.regions : merge(v, {
    account_number = k
    region = region
  })
]]) : format("%s-%s", account_region.account_number, account_region.region) => account_region }

...And finally, we can declare the modules themselves like this:

module "home_region_baseline" {
  source = "../../_modules/orchestration/terraform-aws-fj-account-baseline"
  for_each = var.assume_roles_to_use
  providers = {
    # Essentially filtering down to only the marked home region for each account.
    aws = aws.by_account_region[format("%s-%s", each.key, each.value.home_region)]
  }
  home_region = each.value.home_region
}

module "non_home_region_baseline" {
  source   = "../../_modules/orchestration/terraform-aws-fj-account-baseline"

  # Kind of like a two-dimensional setsubtract. We're basically using all account/regions EXCEPT for
  # the region in each account which has been marked as the home region (whereas `home_region_baseline`
  # does the opposite).
  for_each = { for account_region in [for account_region in local.account_regions_to_use :
    account_region if account_region.region != account_region.home_region
  ] : format("%s-%s", account_region.account_number, account_region.region) => account_region }
  providers = {
    aws = aws.by_account_region[each.key]
  }
  home_region = each.value.home_region
  existing_delivery_config = {
    # The `home_region_baseline` instances are keyed by the account number, so we're using that to get its attributes.
    s3_bucket_name = module.home_region_baseline[each.value.account_number].config_delivery_bucket.s3_bucket_id
    s3_kms_key_arn = module.home_region_baseline[each.value.account_number].config_delivery_bucket_key.key_arn
  }
}

This is where it gets a lot more complicated, but this could work for any number of differently-configured accounts, each with their own different sets of regions, central regions, etc. Perhaps a little too extreme for most use cases, though.

For several years, this has been one of Terraform's most infamous pain points, and it's not a stretch to say this could be game-changer for those looking to use it at scale. Just be aware of its tradeoffs — just because a bulldozer is powerful, doesn't mean you should always use one.

Article Content