Terraform state without locking is a bug waiting to happen. Two engineers running apply simultaneously can corrupt state in ways that take hours to untangle. Here’s what I learned after one such incident.

Why state locking matters

Terraform reads state, computes a plan, and writes new state. Without locking, two concurrent runs can:

  • Both read the same initial state
  • Both compute their plans based on it
  • Both write conflicting state — last one wins
  • Now state doesn’t match real infrastructure

The symptoms are weird: resources exist but Terraform wants to create them again. Or state references resources that were already destroyed.

S3 backend with DynamoDB locking

The canonical setup for AWS:

terraform {
  backend "s3" {
    bucket         = "mycompany-tfstate"
    key            = "prod/main.tfstate"
    region         = "eu-west-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

DynamoDB table:

resource "aws_dynamodb_table" "tf_lock" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  attribute {
    name = "LockID"
    type = "S"
  }
}

Creates a write-lock entry during apply, removes on completion.

Remote state with HTTP backend

If not on AWS, the HTTP backend with Consul, etcd, or a self-hosted server works:

terraform {
  backend "http" {
    address        = "https://tfstate.example.com/state/prod"
    lock_address   = "https://tfstate.example.com/state/prod/lock"
    unlock_address = "https://tfstate.example.com/state/prod/lock"
    lock_method    = "POST"
    unlock_method  = "DELETE"
  }
}

When locks get stuck

Sometimes an apply gets killed (network, OOM, ctrl-C during critical section). The lock stays. Next apply fails with:

Error acquiring the state lock: ConditionalCheckFailedException

Check who holds it:

terraform force-unlock -force LOCK_ID

But verify first — coordinate in chat, confirm nobody is actually applying, THEN force-unlock.

CI/CD considerations

  • Use separate state files per environment (dev, staging, prod) with separate locks
  • Implement a mutex at the CI level too (GitHub Actions concurrency group)
  • Never run apply from developer laptops in production
  • Log state lock acquisition/release in CI logs for audit

State backup

Even with locking, state can get corrupted by bugs, bad updates, or operator error. Enable versioning on the S3 bucket:

resource "aws_s3_bucket_versioning" "tfstate" {
  bucket = aws_s3_bucket.tfstate.id
  versioning_configuration {
    status = "Enabled"
  }
}

This saved me once when a Terraform upgrade broke state migration.

Sensitive values

State contains secrets (passwords, tokens). Keep it encrypted at rest, restrict IAM access, use sensitive = true on variables, and consider external secret management (Vault, SSM Parameter Store) for anything critical.