Terraform state without locking is a bug waiting to happen. Two engineers running apply simultaneously can corrupt state in ways that take hours to untangle. Here’s what I learned after one such incident.
Why state locking matters
Terraform reads state, computes a plan, and writes new state. Without locking, two concurrent runs can:
- Both read the same initial state
- Both compute their plans based on it
- Both write conflicting state — last one wins
- Now state doesn’t match real infrastructure
The symptoms are weird: resources exist but Terraform wants to create them again. Or state references resources that were already destroyed.
S3 backend with DynamoDB locking
The canonical setup for AWS:
terraform {
backend "s3" {
bucket = "mycompany-tfstate"
key = "prod/main.tfstate"
region = "eu-west-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
DynamoDB table:
resource "aws_dynamodb_table" "tf_lock" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
Creates a write-lock entry during apply, removes on completion.
Remote state with HTTP backend
If not on AWS, the HTTP backend with Consul, etcd, or a self-hosted server works:
terraform {
backend "http" {
address = "https://tfstate.example.com/state/prod"
lock_address = "https://tfstate.example.com/state/prod/lock"
unlock_address = "https://tfstate.example.com/state/prod/lock"
lock_method = "POST"
unlock_method = "DELETE"
}
}
When locks get stuck
Sometimes an apply gets killed (network, OOM, ctrl-C during critical section). The lock stays. Next apply fails with:
Error acquiring the state lock: ConditionalCheckFailedException
Check who holds it:
terraform force-unlock -force LOCK_ID
But verify first — coordinate in chat, confirm nobody is actually applying, THEN force-unlock.
CI/CD considerations
- Use separate state files per environment (dev, staging, prod) with separate locks
- Implement a mutex at the CI level too (GitHub Actions concurrency group)
- Never run apply from developer laptops in production
- Log state lock acquisition/release in CI logs for audit
State backup
Even with locking, state can get corrupted by bugs, bad updates, or operator error. Enable versioning on the S3 bucket:
resource "aws_s3_bucket_versioning" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
versioning_configuration {
status = "Enabled"
}
}
This saved me once when a Terraform upgrade broke state migration.
Sensitive values
State contains secrets (passwords, tokens). Keep it encrypted at rest, restrict IAM access, use sensitive = true on variables, and consider external secret management (Vault, SSM Parameter Store) for anything critical.