Should I use Terraform or OpenTofu for multi-cloud in 2026?

OpenTofu is the safer long-term choice for multi-cloud. It uses the MPL 2.0 license with no vendor lock-in risk. Terraform's BSL 1.1 restricts competitive use. Both share HCL syntax and most providers, so migration is straightforward.

How do you manage Terraform state across multiple clouds?

Use one state file per cloud per environment. Store AWS state in S3 with DynamoDB locking, GCP state in GCS, and Azure state in Azure Blob Storage. Never combine multiple cloud providers in a single state file.

What is the best way to test Terraform modules?

Use a layered approach. Run terraform validate and tflint for syntax. Use Checkov or OPA for policy checks. Run Terratest for integration tests that deploy real infrastructure. Add Infracost for cost regression testing in CI.

Is Crossplane a replacement for Terraform?

Crossplane complements Terraform rather than replacing it. Crossplane excels at continuous reconciliation inside Kubernetes clusters. Terraform excels at initial provisioning and day-zero infrastructure. Many teams use both together.

Multi-Cloud Terraform Patterns in 2026

Multi-cloud Terraform architecture diagram spanning AWS GCP and Azure

Multi-cloud Terraform manages infrastructure across AWS, GCP, and Azure from a single codebase

Last updated: May 2026 - Covers Terraform 1.10, OpenTofu 1.9, Crossplane 1.17, Spacelift Stacks, Atlantis 0.30, Infracost 0.10, Checkov 3.x, and OPA/Rego v1.

Why Multi-Cloud

Multi-cloud is no longer a theoretical architecture pattern. By 2026, 89% of enterprises run workloads across two or more cloud providers, according to Flexera's State of the Cloud report. The drivers are concrete: regulatory compliance that mandates data residency, best-of-breed service selection (BigQuery for analytics, AWS for compute, Azure for Active Directory integration), M&A inheritance, and genuine disaster recovery across provider boundaries.

The honest truth: multi-cloud adds complexity. You are managing multiple identity systems, networking models, billing structures, and API surfaces. The question is not whether multi-cloud is harder. It is. The question is whether your business requirements justify that complexity.

Reality check: Do not adopt multi-cloud for "avoiding vendor lock-in" alone. That is the most expensive insurance policy in cloud computing. Adopt it when you have a concrete business requirement that a single provider cannot satisfy.

When multi-cloud is justified, Infrastructure as Code becomes non-negotiable. Manual provisioning across one cloud is painful. Across three, it is impossible. Terraform (and its fork OpenTofu) emerged as the dominant tool for this problem because of one key design decision: provider plugins. Every cloud gets a first-class provider, and your workflow stays the same.

The real reasons teams go multi-cloud

Regulatory compliance - EU data residency laws may require a European provider alongside your primary US cloud
Best-of-breed services - GCP BigQuery for analytics, AWS Lambda for serverless, Azure AD for identity
M&A inheritance - Acquiring a company that runs on a different cloud than yours
Disaster recovery - True provider-level redundancy (not just multi-region)
Negotiation leverage - Credible multi-cloud capability gives you pricing power with sales teams
Talent availability - Hiring from a broader pool of cloud engineers

Terraform vs OpenTofu - BSL vs MPL

In August 2023, HashiCorp changed Terraform's license from MPL 2.0 (Mozilla Public License) to BSL 1.1 (Business Source License). The BSL restricts anyone from offering a competing commercial product based on Terraform's code. In response, the Linux Foundation forked Terraform as OpenTofu under the original MPL 2.0 license.

By May 2026, the fork has matured significantly. OpenTofu 1.9 runs in production at thousands of organizations. The two projects have diverged in meaningful ways, though they still share the same HCL syntax and most provider plugins.

Feature	Terraform 1.10 (BSL 1.1)	OpenTofu 1.9 (MPL 2.0)
License	BSL 1.1 - restricts competitive use	MPL 2.0 - fully open source
State encryption	Not built-in (use backend encryption)	Native client-side state encryption
Provider registry	registry.terraform.io	registry.opentofu.org (mirrors + originals)
Early variable evaluation	Partial (1.10)	Full support since 1.8
for_each on providers	Not supported	Supported since 1.7
Cloud integration	Terraform Cloud / HCP Terraform	Any backend (Spacelift, env0, Scalr)
Community governance	HashiCorp (IBM) controlled	Linux Foundation, open governance
Migration effort	N/A	Minimal - rename binary, update CI

The practical impact: if you are building a product that competes with HashiCorp's offerings, you must use OpenTofu. If you are an end-user provisioning your own infrastructure, both work. OpenTofu's native state encryption and for_each on providers are genuine technical advantages for multi-cloud work.

Migration path: Moving from Terraform to OpenTofu is straightforward. Replace the terraform binary with tofu, update your CI scripts, and point provider sources to the OpenTofu registry. State files are compatible. Most teams complete migration in a single sprint.

# Install OpenTofu (Linux/macOS)
curl -fsSL https://get.opentofu.org/install-opentofu.sh | sh

# Verify installation
tofu version
# OpenTofu v1.9.0

# Initialize existing Terraform project with OpenTofu
cd my-infrastructure/
tofu init
tofu plan  # Same workflow, same HCL, same state

Multi-Cloud Module Patterns

The fundamental challenge of multi-cloud Terraform is abstraction without over-abstraction. You need modules that work across providers without hiding the provider-specific features that make each cloud valuable. There are three proven patterns in 2026.

Pattern 1: Provider-specific modules with a shared interface

This is the most common and most practical pattern. You create separate modules for each cloud provider, but they expose the same output interface. A wrapper module selects the right provider module based on a variable.

# modules/compute/aws/main.tf
variable "instance_name" { type = string }
variable "instance_size" { type = string }
variable "subnet_id"     { type = string }

resource "aws_instance" "this" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_size
  subnet_id     = var.subnet_id

  tags = {
    Name = var.instance_name
  }
}

output "instance_id"  { value = aws_instance.this.id }
output "private_ip"   { value = aws_instance.this.private_ip }
output "public_ip"    { value = aws_instance.this.public_ip }

# modules/compute/gcp/main.tf
variable "instance_name" { type = string }
variable "instance_size" { type = string }
variable "subnet_id"     { type = string }

resource "google_compute_instance" "this" {
  name         = var.instance_name
  machine_type = var.instance_size
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2404-lts"
    }
  }

  network_interface {
    subnetwork = var.subnet_id
    access_config {}
  }
}

output "instance_id"  { value = google_compute_instance.this.id }
output "private_ip"   { value = google_compute_instance.this.network_interface[0].network_ip }
output "public_ip"    { value = google_compute_instance.this.network_interface[0].access_config[0].nat_ip }

# modules/compute/main.tf - Wrapper module
variable "cloud_provider" {
  type    = string
  default = "aws"
  validation {
    condition     = contains(["aws", "gcp", "azure"], var.cloud_provider)
    error_message = "cloud_provider must be aws, gcp, or azure."
  }
}

variable "instance_name" { type = string }
variable "instance_size" { type = string }
variable "subnet_id"     { type = string }

module "aws" {
  source = "./aws"
  count  = var.cloud_provider == "aws" ? 1 : 0

  instance_name = var.instance_name
  instance_size = var.instance_size
  subnet_id     = var.subnet_id
}

module "gcp" {
  source = "./gcp"
  count  = var.cloud_provider == "gcp" ? 1 : 0

  instance_name = var.instance_name
  instance_size = var.instance_size
  subnet_id     = var.subnet_id
}

output "instance_id" {
  value = var.cloud_provider == "aws" ? module.aws[0].instance_id : module.gcp[0].instance_id
}

output "private_ip" {
  value = var.cloud_provider == "aws" ? module.aws[0].private_ip : module.gcp[0].private_ip
}

Pattern 2: Terragrunt DRY configuration

Terragrunt wraps Terraform/OpenTofu to eliminate repetition across environments and clouds. It is especially useful for multi-cloud because it manages backend configuration, provider versions, and variable inheritance across dozens of root modules.

# terragrunt.hcl (root)
locals {
  account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  region_vars  = read_terragrunt_config(find_in_parent_folders("region.hcl"))
  cloud        = local.account_vars.locals.cloud_provider
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = local.cloud == "aws" ? templatefile("aws-provider.tftpl", {
    region = local.region_vars.locals.region
  }) : templatefile("gcp-provider.tftpl", {
    project = local.account_vars.locals.project_id
    region  = local.region_vars.locals.region
  })
}

remote_state {
  backend = local.cloud == "aws" ? "s3" : "gcs"
  config = local.cloud == "aws" ? {
    bucket         = "myorg-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  } : {
    bucket = "myorg-terraform-state"
    prefix = path_relative_to_include()
  }
}

Pattern 3: Monorepo with workspaces per cloud

For smaller teams, a single repository with directory-based separation works well. Each cloud gets its own directory, its own state, and its own CI pipeline. Shared modules live in a modules/ directory.

infrastructure/
  modules/
    networking/       # Shared interface, provider-specific implementations
    compute/
    database/
    monitoring/
  environments/
    production/
      aws/
        main.tf       # Uses modules/compute/aws
        backend.tf    # S3 backend
      gcp/
        main.tf       # Uses modules/compute/gcp
        backend.tf    # GCS backend
      azure/
        main.tf       # Uses modules/compute/azure
        backend.tf    # Azure Blob backend
    staging/
      aws/
      gcp/

Which pattern should you pick? Pattern 1 (wrapper modules) works best when you need to deploy the same logical resource to different clouds. Pattern 2 (Terragrunt) works best when you have many environments and want DRY configuration. Pattern 3 (monorepo) works best for small teams that want simplicity. Most mature organizations combine Patterns 1 and 2.

State Management

State is the hardest problem in multi-cloud Terraform. Every terraform apply writes a state file that maps your HCL configuration to real cloud resources. In a multi-cloud setup, you need a deliberate strategy for where state lives, how it is locked, and how it is encrypted.

Rule 1: One state file per cloud per environment

Never combine AWS and GCP resources in the same state file. When a GCP API timeout causes a state lock, you do not want your AWS resources blocked too. Blast radius isolation is the primary design principle.

# aws/production/backend.tf
terraform {
  backend "s3" {
    bucket         = "myorg-tfstate-prod"
    key            = "aws/production/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-locks"
    kms_key_id     = "alias/terraform-state"
  }
}

# gcp/production/backend.tf
terraform {
  backend "gcs" {
    bucket = "myorg-tfstate-prod"
    prefix = "gcp/production"
  }
}

# azure/production/backend.tf
terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "myorgtfstateprod"
    container_name       = "tfstate"
    key                  = "azure/production/terraform.tfstate"
  }
}

Rule 2: Cross-cloud references via data sources

When your GCP workload needs to reference an AWS resource (like a VPN gateway IP), use terraform_remote_state data sources or, better yet, output the values to a shared parameter store.

# In GCP config, reference AWS state outputs
data "terraform_remote_state" "aws_network" {
  backend = "s3"
  config = {
    bucket = "myorg-tfstate-prod"
    key    = "aws/production/networking/terraform.tfstate"
    region = "us-west-2"
  }
}

# Use the AWS VPN gateway IP in GCP
resource "google_compute_vpn_tunnel" "to_aws" {
  name          = "gcp-to-aws-tunnel"
  peer_ip       = data.terraform_remote_state.aws_network.outputs.vpn_gateway_ip
  shared_secret = var.vpn_shared_secret

  target_vpn_gateway = google_compute_vpn_gateway.this.id
}

Rule 3: State encryption at rest and in transit

Every backend listed above encrypts state at rest by default (S3 with SSE, GCS with Google-managed keys, Azure Blob with SSE). But state files contain secrets: database passwords, API keys, private IPs. OpenTofu's native client-side encryption adds a second layer.

# OpenTofu native state encryption (tofu 1.7+)
terraform {
  encryption {
    key_provider "aws_kms" "state_key" {
      kms_key_id = "alias/tofu-state-encryption"
      region     = "us-west-2"
    }

    method "aes_gcm" "encrypt" {
      keys = key_provider.aws_kms.state_key
    }

    state {
      method   = method.aes_gcm.encrypt
      enforced = true
    }

    plan {
      method   = method.aes_gcm.encrypt
      enforced = true
    }
  }
}

State locking is mandatory. Without locking, two engineers running terraform apply simultaneously will corrupt your state. S3 uses DynamoDB for locking. GCS has built-in locking. Azure Blob uses blob leases. Never disable locking in production.

Provider-Agnostic Patterns with Crossplane

Crossplane takes a fundamentally different approach to multi-cloud infrastructure. Instead of writing HCL and running terraform apply, you define infrastructure as Kubernetes custom resources. Crossplane's controllers continuously reconcile the desired state against the actual state in each cloud provider.

This is a paradigm shift. Terraform is imperative-declarative: you declare what you want, then run a command to make it happen. Crossplane is fully declarative: you submit a YAML manifest, and the controller loop handles creation, updates, and drift correction automatically.

# Crossplane Composition - abstract "Database" that works on any cloud
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: database-aws
  labels:
    provider: aws
spec:
  compositeTypeRef:
    apiVersion: platform.myorg.io/v1alpha1
    kind: Database
  resources:
    - name: rds-instance
      base:
        apiVersion: rds.aws.upbound.io/v1beta1
        kind: Instance
        spec:
          forProvider:
            engine: postgres
            engineVersion: "16"
            instanceClass: db.r6g.large
            allocatedStorage: 100
            publiclyAccessible: false
          providerConfigRef:
            name: aws-production
      patches:
        - fromFieldPath: spec.parameters.size
          toFieldPath: spec.forProvider.instanceClass
          transforms:
            - type: map
              map:
                small: db.r6g.medium
                medium: db.r6g.large
                large: db.r6g.xlarge

# Crossplane Claim - deploy a database without knowing which cloud
apiVersion: platform.myorg.io/v1alpha1
kind: Database
metadata:
  name: orders-db
  namespace: production
spec:
  parameters:
    size: medium
    engine: postgres
    version: "16"
  compositionSelector:
    matchLabels:
      provider: aws  # Change to "gcp" or "azure" to switch clouds

When to use Crossplane vs Terraform

Scenario	Best tool	Why
Day-zero infrastructure provisioning	Terraform/OpenTofu	Better plan/apply workflow, mature ecosystem
Continuous drift correction	Crossplane	Controller loop detects and fixes drift automatically
Developer self-service platform	Crossplane	Developers submit claims via kubectl, no HCL knowledge needed
Complex dependency graphs	Terraform/OpenTofu	Better dependency resolution and planning
GitOps-native workflow	Crossplane + ArgoCD	Infrastructure reconciled from Git like application code
Existing HCL codebase	Terraform/OpenTofu	No rewrite needed, incremental adoption possible

Many production teams use both. Terraform provisions the foundational infrastructure (VPCs, Kubernetes clusters, IAM). Crossplane manages the workload-level resources inside those clusters (databases, caches, queues) with continuous reconciliation. This layered approach gives you the best of both worlds. For a deeper comparison of IaC tools, see our Infrastructure as Code guide.

Real Examples - AWS + GCP + Azure

Theory is useful. Working code is better. Here is a complete multi-cloud deployment that provisions a VPC/VNet on each provider, peers them via VPN, and deploys a managed Kubernetes cluster on each. This is the pattern used by organizations that need workloads distributed across all three major clouds.

Multi-cloud networking foundation

# providers.tf - Pin every provider version
terraform {
  required_version = ">= 1.9.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.80"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 6.10"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.10"
    }
  }
}

provider "aws" {
  region = "us-west-2"
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Project     = "multi-cloud-platform"
    }
  }
}

provider "google" {
  project = var.gcp_project_id
  region  = "us-central1"
}

provider "azurerm" {
  features {}
  subscription_id = var.azure_subscription_id
}

# aws-network.tf
module "aws_vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.16.0"

  name = "multi-cloud-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-west-2a", "us-west-2b", "us-west-2c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway   = true
  single_nat_gateway   = var.environment != "production"
  enable_vpn_gateway   = true
  enable_dns_hostnames = true

  tags = { Cloud = "aws" }
}

# AWS EKS cluster
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.31.0"

  cluster_name    = "multi-cloud-eks"
  cluster_version = "1.31"
  vpc_id          = module.aws_vpc.vpc_id
  subnet_ids      = module.aws_vpc.private_subnets

  eks_managed_node_groups = {
    default = {
      instance_types = ["m7g.large"]
      min_size       = 2
      max_size       = 10
      desired_size   = 3
    }
  }
}

# gcp-network.tf
resource "google_compute_network" "main" {
  name                    = "multi-cloud-vpc"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "private" {
  name          = "private-subnet"
  ip_cidr_range = "10.1.0.0/16"
  region        = "us-central1"
  network       = google_compute_network.main.id

  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.2.0.0/16"
  }

  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.3.0.0/20"
  }
}

# GKE Autopilot cluster
resource "google_container_cluster" "main" {
  name     = "multi-cloud-gke"
  location = "us-central1"

  enable_autopilot = true
  network          = google_compute_network.main.id
  subnetwork       = google_compute_subnetwork.private.id

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  release_channel {
    channel = "REGULAR"
  }
}

# azure-network.tf
resource "azurerm_resource_group" "main" {
  name     = "multi-cloud-rg"
  location = "East US 2"
}

resource "azurerm_virtual_network" "main" {
  name                = "multi-cloud-vnet"
  address_space       = ["10.4.0.0/16"]
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
}

resource "azurerm_subnet" "aks" {
  name                 = "aks-subnet"
  resource_group_name  = azurerm_resource_group.main.name
  virtual_network_name = azurerm_virtual_network.main.name
  address_prefixes     = ["10.4.1.0/24"]
}

# AKS cluster
resource "azurerm_kubernetes_cluster" "main" {
  name                = "multi-cloud-aks"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "multicloud"
  kubernetes_version  = "1.31"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D4s_v5"
    vnet_subnet_id = azurerm_subnet.aks.id
  }

  identity {
    type = "SystemAssigned"
  }

  network_profile {
    network_plugin = "azure"
    network_policy = "calico"
  }
}

Cross-cloud VPN connectivity

# aws-to-gcp-vpn.tf
resource "aws_vpn_connection" "to_gcp" {
  vpn_gateway_id      = module.aws_vpc.vgw_id
  customer_gateway_id = aws_customer_gateway.gcp.id
  type                = "ipsec.1"
  static_routes_only  = true

  tags = { Name = "aws-to-gcp-vpn" }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = google_compute_address.vpn_static_ip.address
  type       = "ipsec.1"

  tags = { Name = "gcp-gateway" }
}

resource "google_compute_address" "vpn_static_ip" {
  name = "vpn-static-ip"
}

resource "google_compute_vpn_gateway" "to_aws" {
  name    = "gcp-to-aws-gateway"
  network = google_compute_network.main.id
}

resource "google_compute_vpn_tunnel" "to_aws" {
  name          = "gcp-to-aws-tunnel"
  peer_ip       = aws_vpn_connection.to_gcp.tunnel1_address
  shared_secret = aws_vpn_connection.to_gcp.tunnel1_preshared_key

  target_vpn_gateway = google_compute_vpn_gateway.to_aws.id

  local_traffic_selector  = ["10.1.0.0/16"]
  remote_traffic_selector = ["10.0.0.0/16"]
}

CIDR planning matters. Notice the non-overlapping CIDR ranges: AWS uses 10.0.0.0/16, GCP uses 10.1.0.0/16, Azure uses 10.4.0.0/16. Plan your IP address space before writing any Terraform. Overlapping CIDRs make cross-cloud networking impossible without NAT, which adds latency and complexity.

Testing IaC - Terratest, Checkov, and OPA

Infrastructure code that is not tested is infrastructure code that will break in production. The IaC testing ecosystem has matured significantly by 2026. A proper testing pyramid for Terraform includes four layers: static analysis, policy checks, integration tests, and cost regression tests.

Layer 1: Static analysis with tflint and terraform validate

# terraform validate catches syntax and type errors
tofu validate

# tflint catches provider-specific issues
tflint --init
tflint --recursive

# Example .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.33.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

plugin "google" {
  enabled = true
  version = "0.30.0"
  source  = "github.com/terraform-linters/tflint-ruleset-google"
}

rule "terraform_naming_convention" {
  enabled = true
  format  = "snake_case"
}

Layer 2: Policy-as-code with Checkov and OPA

Checkov scans Terraform plans and HCL files against 1,000+ built-in security policies. OPA (Open Policy Agent) with Rego lets you write custom policies. Use both: Checkov for baseline cloud security, OPA for organization-specific rules.

# Checkov scan against Terraform plan
tofu plan -out=tfplan
tofu show -json tfplan > tfplan.json

checkov -f tfplan.json \
  --framework terraform_plan \
  --check CKV_AWS_18,CKV_AWS_19,CKV_AWS_145 \
  --compact

# Checkov custom policy example
# policies/require_encryption.py
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
from checkov.common.models.enums import CheckResult, CheckCategories

class S3BucketEncryption(BaseResourceCheck):
    def __init__(self):
        name = "Ensure S3 bucket has KMS encryption"
        id = "CUSTOM_AWS_001"
        supported_resources = ["aws_s3_bucket"]
        categories = [CheckCategories.ENCRYPTION]
        super().__init__(name=name, id=id,
                         categories=categories,
                         supported_resources=supported_resources)

    def scan_resource_conf(self, conf):
        # Check for server-side encryption configuration
        if "server_side_encryption_configuration" in conf:
            return CheckResult.PASSED
        return CheckResult.FAILED

# OPA Rego policy - enforce tagging standards across all clouds
# policies/tagging.rego
package terraform.tagging

import rego.v1

required_tags := {"Environment", "ManagedBy", "Project", "CostCenter"}

deny contains msg if {
    resource := input.planned_values.root_module.resources[_]
    tags := object.get(resource.values, "tags", {})
    missing := required_tags - {key | tags[key]}
    count(missing) > 0
    msg := sprintf(
        "%s '%s' is missing required tags: %v",
        [resource.type, resource.name, missing]
    )
}

deny contains msg if {
    resource := input.planned_values.root_module.resources[_]
    resource.values.tags.Environment
    not resource.values.tags.Environment in {"production", "staging", "development"}
    msg := sprintf(
        "%s '%s' has invalid Environment tag: %s",
        [resource.type, resource.name, resource.values.tags.Environment]
    )
}

# Run OPA against Terraform plan
tofu show -json tfplan > tfplan.json
opa eval \
  --data policies/ \
  --input tfplan.json \
  "data.terraform.tagging.deny" \
  --format pretty

Layer 3: Integration tests with Terratest

Terratest deploys real infrastructure, validates it, and tears it down. It is the gold standard for IaC integration testing. The tradeoff: it is slow (minutes per test) and costs real money. Run it in CI on pull requests to shared modules, not on every commit.

// test/multi_cloud_test.go
package test

import (
    "testing"

    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/gruntwork-io/terratest/modules/gcp"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestMultiCloudNetworking(t *testing.T) {
    t.Parallel()

    terraformOptions := terraform.WithDefaultRetryableErrors(t,
        &terraform.Options{
            TerraformDir: "../environments/test/aws",
            Vars: map[string]interface{}{
                "environment": "test",
                "instance_size": "t3.micro",
            },
        },
    )

    // Deploy infrastructure
    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Validate AWS VPC was created
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    vpc := aws.GetVpcById(t, vpcId, "us-west-2")
    assert.Equal(t, "10.0.0.0/16", vpc.CidrBlock)

    // Validate subnets
    subnets := aws.GetSubnetsForVpc(t, vpcId, "us-west-2")
    assert.Equal(t, 6, len(subnets))

    // Validate EKS cluster is active
    clusterName := terraform.Output(t, terraformOptions, "cluster_name")
    cluster := aws.GetEksCluster(t, "us-west-2", clusterName)
    assert.Equal(t, "ACTIVE", cluster.Status)
}

Layer 4: Cost regression with Infracost

Infracost estimates the monthly cost of your Terraform changes before they are applied. In a multi-cloud setup, this prevents surprise bills when someone changes an instance type or adds a resource in a cloud you are less familiar with.

# Generate cost breakdown for all clouds
infracost breakdown --path environments/production/aws --format json > aws-costs.json
infracost breakdown --path environments/production/gcp --format json > gcp-costs.json
infracost breakdown --path environments/production/azure --format json > azure-costs.json

# Compare costs against the main branch (in CI)
infracost diff --path . --compare-to infracost-base.json --format github-comment

Testing layer	Tool	Speed	Cost	When to run
Static analysis	tflint, terraform validate	Seconds	Free	Every commit
Policy checks	Checkov, OPA	Seconds	Free	Every commit
Integration tests	Terratest	5-20 minutes	Real cloud costs	PR to shared modules
Cost regression	Infracost	30 seconds	Free tier available	Every PR

CI/CD Pipelines - GitHub Actions, Atlantis, and Spacelift

Running terraform apply from a laptop is fine for learning. In production multi-cloud environments, every change must flow through a CI/CD pipeline with plan review, policy checks, cost estimation, and approval gates. Three tools dominate this space in 2026.

GitHub Actions - the DIY approach

GitHub Actions gives you full control over the pipeline. You write the workflow, you manage the credentials, you handle the concurrency. This is the most flexible option but requires the most maintenance.

# .github/workflows/terraform-multicloud.yml
name: Multi-Cloud Terraform
on:
  pull_request:
    paths: ["environments/**", "modules/**"]
  push:
    branches: [main]
    paths: ["environments/**", "modules/**"]

permissions:
  id-token: write
  contents: read
  pull-requests: write

env:
  TF_VERSION: "1.9.5"
  TOFU_VERSION: "1.9.0"

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      aws: ${{ steps.filter.outputs.aws }}
      gcp: ${{ steps.filter.outputs.gcp }}
      azure: ${{ steps.filter.outputs.azure }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            aws:
              - 'environments/**/aws/**'
              - 'modules/**'
            gcp:
              - 'environments/**/gcp/**'
              - 'modules/**'
            azure:
              - 'environments/**/azure/**'
              - 'modules/**'

  plan-aws:
    needs: detect-changes
    if: needs.detect-changes.outputs.aws == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: opentofu/setup-opentofu@v1
        with:
          tofu_version: ${{ env.TOFU_VERSION }}

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-ci
          aws-region: us-west-2

      - name: Tofu Plan
        working-directory: environments/production/aws
        run: |
          tofu init -input=false
          tofu plan -input=false -out=tfplan
          tofu show -no-color tfplan > plan.txt

      - name: Checkov Scan
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: environments/production/aws
          framework: terraform

      - name: Infracost
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
      - run: |
          infracost breakdown --path environments/production/aws \
            --format json --out-file /tmp/infracost.json
          infracost comment github \
            --path /tmp/infracost.json \
            --repo ${{ github.repository }} \
            --pull-request ${{ github.event.pull_request.number }} \
            --github-token ${{ secrets.GITHUB_TOKEN }}

  plan-gcp:
    needs: detect-changes
    if: needs.detect-changes.outputs.gcp == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: opentofu/setup-opentofu@v1
        with:
          tofu_version: ${{ env.TOFU_VERSION }}

      - uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: ${{ secrets.GCP_WIF_PROVIDER }}
          service_account: terraform-ci@myproject.iam.gserviceaccount.com

      - name: Tofu Plan
        working-directory: environments/production/gcp
        run: |
          tofu init -input=false
          tofu plan -input=false -out=tfplan

  apply:
    needs: [plan-aws, plan-gcp]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - uses: opentofu/setup-opentofu@v1
        with:
          tofu_version: ${{ env.TOFU_VERSION }}
      - name: Apply AWS
        working-directory: environments/production/aws
        run: |
          tofu init -input=false
          tofu apply -input=false -auto-approve

Atlantis - pull request automation

Atlantis runs Terraform plan and apply directly from pull request comments. It is self-hosted, free, and integrates with GitHub, GitLab, and Bitbucket. For multi-cloud, you configure separate projects per cloud directory.

# atlantis.yaml
version: 3
projects:
  - name: aws-production
    dir: environments/production/aws
    workspace: default
    terraform_version: v1.9.5
    autoplan:
      when_modified: ["**/*.tf", "../../modules/**"]
      enabled: true
    apply_requirements: [approved, mergeable]
    workflow: aws

  - name: gcp-production
    dir: environments/production/gcp
    workspace: default
    terraform_version: v1.9.5
    autoplan:
      when_modified: ["**/*.tf", "../../modules/**"]
      enabled: true
    apply_requirements: [approved, mergeable]
    workflow: gcp

workflows:
  aws:
    plan:
      steps:
        - run: checkov -d . --framework terraform --compact
        - init
        - plan
    apply:
      steps:
        - apply
  gcp:
    plan:
      steps:
        - run: checkov -d . --framework terraform --compact
        - init
        - plan
    apply:
      steps:
        - apply

Spacelift - managed Terraform platform

Spacelift is a managed platform that handles state, credentials, policy enforcement, and drift detection. It supports Terraform, OpenTofu, Pulumi, CloudFormation, and Ansible. For multi-cloud teams, Spacelift's stack dependencies and policy-as-code features reduce operational overhead significantly.

Spacelift pricing starts at $0.014 per managed resource per hour (about $10/resource/month). For a 500-resource multi-cloud deployment, that is roughly $5,000/month. Compare that against the engineering time to maintain a custom GitHub Actions pipeline with equivalent features.

Feature	GitHub Actions	Atlantis	Spacelift
Hosting	GitHub-managed	Self-hosted	SaaS or self-hosted
Cost	Free tier + minutes	Free (open source)	$0.014/resource/hour
Policy engine	DIY (Checkov/OPA)	DIY (custom workflows)	Built-in OPA + Rego
Drift detection	DIY (scheduled plans)	Not built-in	Built-in, scheduled
State management	DIY (S3/GCS/Blob)	DIY	Built-in or BYO
Multi-cloud credentials	OIDC per provider	Environment variables	Cloud integrations
Stack dependencies	Manual job ordering	Not supported	Built-in DAG
Setup effort	High	Medium	Low

Recommendation: Start with GitHub Actions if you have fewer than 200 managed resources and strong DevOps skills. Move to Atlantis when your team grows and you want PR-based workflows without SaaS costs. Adopt Spacelift when you need drift detection, policy enforcement, and stack dependencies without building them yourself.

Cost Management with Infracost

Multi-cloud cost management is harder than single-cloud because each provider has different pricing models, discount structures, and billing APIs. Infracost solves the Terraform-specific piece: estimating costs before deployment and catching cost regressions in pull requests.

Infracost configuration for multi-cloud

# infracost.yml - multi-cloud project config
version: 0.1
projects:
  - path: environments/production/aws
    name: aws-production
    terraform_var_files:
      - terraform.tfvars
    usage_file: infracost-usage-aws.yml

  - path: environments/production/gcp
    name: gcp-production
    terraform_var_files:
      - terraform.tfvars
    usage_file: infracost-usage-gcp.yml

  - path: environments/production/azure
    name: azure-production
    terraform_var_files:
      - terraform.tfvars
    usage_file: infracost-usage-azure.yml

# infracost-usage-aws.yml - usage estimates for accurate pricing
version: 0.1
resource_usage:
  aws_instance.web:
    operating_system: linux
    monthly_hrs: 730  # 24/7

  aws_lambda_function.api:
    monthly_requests: 10000000
    request_duration_ms: 200

  aws_s3_bucket.data:
    standard:
      storage_gb: 500
      monthly_tier_1_requests: 1000000
      monthly_tier_2_requests: 5000000

  aws_nat_gateway.main:
    monthly_data_processed_gb: 100

Cost policies in CI

Infracost integrates with OPA to enforce cost policies. You can block PRs that exceed a monthly cost threshold or require additional approval for expensive changes.

# policies/cost-policy.rego
package infracost

import rego.v1

deny contains msg if {
    maxDiff := 500.0
    msg := sprintf(
        "Monthly cost increase of $%v exceeds $%v threshold. Requires platform team approval.",
        [input.diffTotalMonthlyCost, maxDiff]
    )
    to_number(input.diffTotalMonthlyCost) > maxDiff
}

deny contains msg if {
    r := input.projects[_].breakdown.resources[_]
    r.monthlyCost
    to_number(r.monthlyCost) > 2000
    msg := sprintf(
        "Resource %s costs $%v/month. Resources over $2,000/month require architecture review.",
        [r.name, r.monthlyCost]
    )
}

Multi-cloud cost comparison dashboard

Run Infracost across all three clouds and compare equivalent workloads. This data is invaluable for deciding where to place new workloads and for cost optimization reviews.

# Generate combined cost report
infracost breakdown --path . --format html --out-file cost-report.html

# Example output (abbreviated):
# Project                  Monthly Cost
# aws-production           $12,450
#   EKS cluster            $2,190
#   EC2 instances          $4,320
#   RDS PostgreSQL         $1,890
#   NAT Gateway            $1,050
#   S3 + CloudFront        $3,000
#
# gcp-production           $9,870
#   GKE Autopilot          $1,650
#   Compute Engine          $3,210
#   Cloud SQL              $1,740
#   Cloud NAT              $780
#   GCS + CDN              $2,490
#
# azure-production         $11,200
#   AKS                    $1,980
#   Virtual Machines       $3,890
#   Azure Database         $1,650
#   NAT Gateway            $890
#   Blob + CDN             $2,790
#
# TOTAL                    $33,520/month

Cost optimization tip: GCP Autopilot is consistently 15-25% cheaper than equivalent EKS or AKS configurations for Kubernetes workloads because you only pay for pod resource requests, not node capacity. If your multi-cloud strategy allows workload placement flexibility, route stateless workloads to GCP.

Common Pitfalls and Anti-Patterns

After working with dozens of multi-cloud Terraform codebases, these are the mistakes that cause the most pain. Every one of them has cost real teams real money and real downtime.

1. The "universal abstraction" trap

Teams try to build a single module that abstracts away all cloud differences. The result is a module with 200 variables, 50 conditionals, and support for none of the provider-specific features that make each cloud valuable. You end up with the lowest common denominator of all three clouds.

Anti-pattern: A single modules/database that tries to create RDS, Cloud SQL, and Azure Database with the same interface. Each has different replication models, backup strategies, and performance tuning options. Abstract the interface, not the implementation.

2. Shared state across clouds

Putting AWS and GCP resources in the same state file means a GCP API outage blocks your AWS deployments. It also means a single terraform destroy can wipe infrastructure across multiple clouds. Always isolate state per cloud per environment.

3. Inconsistent provider version pinning

# BAD - unpinned providers will break your builds
terraform {
  required_providers {
    aws    = { source = "hashicorp/aws" }
    google = { source = "hashicorp/google" }
  }
}

# GOOD - pinned with pessimistic constraint
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.80"  # Allows 5.80.x but not 5.81.0
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 6.10"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.10"
    }
  }
}

4. No CIDR planning

Teams assign overlapping IP ranges to different clouds, then discover they cannot create VPN tunnels or peering connections. Plan your entire IP address space across all clouds before writing a single line of HCL. Document it in a shared spreadsheet or, better yet, in a Terraform locals block.

# cidr-plan.tf - document your IP allocation
locals {
  cidr_plan = {
    aws = {
      vpc_cidr     = "10.0.0.0/16"
      pod_cidr     = "10.10.0.0/16"
      service_cidr = "10.20.0.0/20"
    }
    gcp = {
      vpc_cidr     = "10.1.0.0/16"
      pod_cidr     = "10.11.0.0/16"
      service_cidr = "10.21.0.0/20"
    }
    azure = {
      vnet_cidr    = "10.4.0.0/16"
      pod_cidr     = "10.14.0.0/16"
      service_cidr = "10.24.0.0/20"
    }
  }
}

5. Manual credential management

Storing cloud credentials as long-lived secrets in CI is a security risk. Use OIDC federation for all three clouds. GitHub Actions, GitLab CI, and Spacelift all support OIDC tokens that exchange for short-lived cloud credentials with no stored secrets.

# AWS OIDC provider for GitHub Actions
resource "aws_iam_openid_connect_provider" "github" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["ffffffffffffffffffffffffffffffffffffffff"]
}

resource "aws_iam_role" "terraform_ci" {
  name = "terraform-ci"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.github.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
        }
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:myorg/infrastructure:*"
        }
      }
    }]
  })
}

6. Ignoring provider-specific best practices

Each cloud has conventions that Terraform cannot enforce. AWS expects tags on everything. GCP expects labels. Azure expects resource groups. Ignoring these conventions means your resources are invisible to the cloud provider's native cost management, security, and compliance tools.

7. No drift detection

Someone logs into the AWS console and changes a security group. Someone uses gcloud to resize an instance. Your Terraform state is now wrong. Schedule regular terraform plan runs (or use Spacelift/Crossplane drift detection) to catch manual changes before they cause incidents.

8. Testing only in one cloud

Your CI pipeline runs Terratest against AWS but skips GCP and Azure "to save time." The GCP module has a bug that only surfaces during apply. Test every cloud in CI, even if you run the tests less frequently for secondary clouds.

9. Monolithic root modules

A single root module with 500 resources takes 10 minutes to plan and has a massive blast radius. Break your infrastructure into small, focused root modules: networking, compute, database, monitoring. Each gets its own state file and its own CI pipeline.

10. No runbook for provider outages

Multi-cloud is supposed to provide resilience, but only if you have tested failover procedures. Document what happens when AWS us-east-1 goes down. Can your GCP workloads handle the traffic? Have you tested DNS failover? Terraform provisions the infrastructure, but operational readiness requires runbooks and regular drills.

Key takeaway: Multi-cloud Terraform is not about writing clever abstractions. It is about disciplined engineering: isolated state, pinned versions, tested modules, automated pipelines, and honest cost tracking. The teams that succeed treat each cloud as a first-class citizen with its own modules, its own tests, and its own operational procedures.