Terraform Guide | Infrastructure as Code Best Practices

Infrastructure as Code (IaC) transforms how we manage cloud resources. Instead of clicking through consoles or running ad-hoc commands, you define infrastructure declaratively and let tools handle the rest. Terraform has emerged as the leading IaC tool, working across multiple cloud providers with a consistent workflow.

Why Infrastructure as Code?

The Problems It Solves

Reproducibility: Same config produces same infrastructure
Version control: Track changes, review, rollback
Collaboration: Multiple team members can work safely
Documentation: Config is the documentation
Disaster recovery: Rebuild infrastructure quickly

Why Terraform?

Multi-cloud: AWS, Azure, GCP, and hundreds of providers
Declarative: Describe desired state, not steps
State management: Tracks what exists
Plan before apply: Preview changes
Module ecosystem: Reusable components

Getting Started

Installation

You can install Terraform directly or use a version manager for easier upgrades. The version manager approach is recommended for teams that need to work with multiple Terraform versions across different projects.

# macOS
brew install terraform

# Windows (Chocolatey)
choco install terraform

# Or use tfenv for version management
brew install tfenv
tfenv install 1.7.0
tfenv use 1.7.0

Basic Structure

Every Terraform project starts with a few key files. This example shows the essential structure for deploying an AWS EC2 instance. You will typically organize your configuration across multiple files as your project grows.

# main.tf
terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

resource "aws_instance" "web" {
  ami           = var.ami_id
  instance_type = var.instance_type

  tags = {
    Name        = "web-server"
    Environment = var.environment
  }
}

The required_providers block pins your provider version, preventing unexpected changes when providers release updates. Always specify version constraints for production infrastructure. The ~> 5.0 constraint allows patches like 5.1 but blocks major version changes.

Variables

Variables make your Terraform code reusable across environments. Define them in a separate file for clarity and easier management.

# variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.micro"
}

variable "environment" {
  description = "Environment name"
  type        = string
}

Note that environment has no default, making it a required variable. This forces you to explicitly set it, preventing accidental deployments to the wrong environment.

You can set variable values in a .tfvars file for each environment. Keep these files separate from your main configuration and include them at apply time.

# terraform.tfvars
aws_region    = "us-west-2"
instance_type = "t3.small"
environment   = "production"

Outputs

Outputs expose values from your infrastructure for use by other systems or for display after apply. They form the interface between your Terraform configuration and the outside world.

# outputs.tf
output "instance_ip" {
  description = "Public IP of the instance"
  value       = aws_instance.web.public_ip
}

output "instance_id" {
  description = "Instance ID"
  value       = aws_instance.web.id
}

You can reference these outputs in CI/CD pipelines or use them as inputs to other Terraform modules. They are also useful for scripting post-deployment steps.

Core Workflow

Terraform has a simple but powerful workflow. You will run these commands hundreds of times throughout the lifecycle of your infrastructure.

# Initialize (download providers)
terraform init

# Preview changes
terraform plan

# Apply changes
terraform apply

# Show current state
terraform show

# Destroy resources
terraform destroy

Always run terraform plan before apply. Review the plan carefully, especially the resources that will be destroyed or replaced. A resource being replaced means downtime, and understanding why Terraform wants to replace rather than update is critical.

State Management

Remote State

Never store state locally in production. Remote state enables team collaboration and prevents state file corruption from concurrent operations.

This configuration stores state in S3 with DynamoDB for locking. This is the standard approach for AWS-based teams.

# backend.tf
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

The encrypt = true setting ensures your state file is encrypted at rest. This is important because state files can contain sensitive data like database passwords, API keys, and other secrets that Terraform needs to manage your resources.

State Locking

State locking prevents concurrent modifications that could corrupt your state file. You need to create the DynamoDB table before configuring the backend.

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

This is a chicken-and-egg problem: you need to create this table with local state first, then migrate to remote state. Many teams create this table manually or with a separate bootstrapping configuration.

State Commands

Sometimes you need to manipulate state directly. These commands help you recover from mistakes or refactor resources without recreating them.

# List resources in state
terraform state list

# Show specific resource
terraform state show aws_instance.web

# Move resource (rename)
terraform state mv aws_instance.old aws_instance.new

# Remove from state (keep resource)
terraform state rm aws_instance.web

# Import existing resource
terraform import aws_instance.web i-1234567890abcdef0

Use terraform import when you need to bring manually-created resources under Terraform management. This is common when adopting IaC for existing infrastructure or when recovering from resources created outside of Terraform.

Modules

Creating a Module

Modules let you package and reuse infrastructure patterns. A module is just a directory with Terraform files that accepts inputs and produces outputs.

modules/
└── vpc/
    ├── main.tf
    ├── variables.tf
    └── outputs.tf

Here is a module that creates a VPC with public subnets. Notice how it uses variables for all configurable values, making it flexible for different use cases.

# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = var.name
  }
}

resource "aws_subnet" "public" {
  count             = length(var.public_subnets)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.public_subnets[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.name}-public-${count.index + 1}"
  }
}

The count meta-argument creates multiple subnets from a list, making the module flexible for different availability zone configurations. This pattern is common when you need to create similar resources across multiple AZs or regions.

Define the module's inputs in variables.tf. Good modules document their variables thoroughly.

# modules/vpc/variables.tf
variable "name" {
  type = string
}

variable "cidr_block" {
  type    = string
  default = "10.0.0.0/16"
}

variable "public_subnets" {
  type = list(string)
}

variable "availability_zones" {
  type = list(string)
}

Using Modules

Call your module from the root configuration and pass in the required variables. The module source can be a local path, Git repository, or the Terraform Registry.

# main.tf
module "vpc" {
  source = "./modules/vpc"

  name               = "production"
  cidr_block         = "10.0.0.0/16"
  public_subnets     = ["10.0.1.0/24", "10.0.2.0/24"]
  availability_zones = ["us-west-2a", "us-west-2b"]
}

# Use module outputs
resource "aws_instance" "web" {
  subnet_id = module.vpc.public_subnet_ids[0]
  # ...
}

Module outputs become accessible as module.<name>.<output>, enabling you to wire modules together into larger architectures.

Public Module Registry

The Terraform Registry contains thousands of community-maintained modules. These can save significant development time and incorporate best practices.

# Use community modules
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.0.0"

  name = "my-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-west-2a", "us-west-2b"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24"]

  enable_nat_gateway = true
}

Always pin module versions in production. Without a version constraint, Terraform will use the latest version, which may introduce breaking changes unexpectedly.

Data Sources

Query existing resources. Data sources let you reference infrastructure that was created outside of Terraform or in a different Terraform configuration.

# Get current AWS account info
data "aws_caller_identity" "current" {}

# Get latest Amazon Linux AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

# Use in resources
resource "aws_instance" "web" {
  ami = data.aws_ami.amazon_linux.id
  # ...
}

The AMI data source is particularly useful because it automatically picks up the latest patched image, keeping your instances up to date without hardcoding AMI IDs. This saves you from manually updating IDs after each AWS release.

Dynamic Blocks

Generate repeated blocks when the number of items is variable. This is common for security group rules where you might have different rules per environment.

variable "ingress_rules" {
  type = list(object({
    port        = number
    cidr_blocks = list(string)
  }))
  default = [
    { port = 80, cidr_blocks = ["0.0.0.0/0"] },
    { port = 443, cidr_blocks = ["0.0.0.0/0"] },
    { port = 22, cidr_blocks = ["10.0.0.0/8"] },
  ]
}

resource "aws_security_group" "web" {
  name = "web-sg"

  dynamic "ingress" {
    for_each = var.ingress_rules
    content {
      from_port   = ingress.value.port
      to_port     = ingress.value.port
      protocol    = "tcp"
      cidr_blocks = ingress.value.cidr_blocks
    }
  }
}

Inside the dynamic block, ingress.value gives you access to the current item in the loop. This pattern keeps your code DRY while remaining readable. Use dynamic blocks sparingly though; overuse can make configurations harder to understand.

Environment Management

Workspaces

Workspaces let you manage multiple environments with the same configuration. Each workspace maintains its own state file.

# Create workspace
terraform workspace new staging

# Switch workspace
terraform workspace select production

# List workspaces
terraform workspace list

You can use the workspace name to customize resource configuration based on the environment.

# Use workspace in config
locals {
  environment = terraform.workspace
}

resource "aws_instance" "web" {
  instance_type = terraform.workspace == "production" ? "t3.large" : "t3.micro"
  # ...
}

Workspaces are simple but have limitations. For significantly different environments, consider the directory-based approach instead, which provides clearer separation of concerns.

Directory-Based Environments

For environments with different configurations, use separate directories. This provides more flexibility and clearer separation.

terraform/
├── modules/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── production/

Best Practices

Project Structure

Organize your Terraform code consistently across projects. This structure works well for most teams and scales as projects grow.

terraform/
├── main.tf           # Main resources
├── variables.tf      # Variable declarations
├── outputs.tf        # Output values
├── versions.tf       # Required versions
├── backend.tf        # State configuration
├── locals.tf         # Local values
└── data.tf           # Data sources

Naming Conventions

Use clear, consistent naming for resources and use locals to avoid repetition. Good naming makes your configuration self-documenting.

# Consistent naming
resource "aws_instance" "web_primary" { }
resource "aws_instance" "web_secondary" { }

# Use locals for repeated values
locals {
  common_tags = {
    Project     = var.project_name
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_instance" "web" {
  tags = merge(local.common_tags, {
    Name = "web-server"
  })
}

The ManagedBy = "terraform" tag helps identify resources that should not be modified manually. This prevents drift and confusion when troubleshooting infrastructure.

Security

Never commit secrets to version control. Use environment variables or IAM roles instead. Terraform provides multiple options for secure credential handling.

# Never commit secrets
# Use environment variables
provider "aws" {
  # Reads AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
}

# Or use IAM roles
# Terraform assumes the role automatically on EC2/ECS

# Use variables for sensitive values
variable "database_password" {
  type      = string
  sensitive = true
}

The sensitive = true attribute prevents Terraform from showing the value in plan or apply output, reducing the risk of accidental exposure in logs or CI/CD output.

Validation

Add validation rules to catch configuration errors early, before they reach production. This shifts errors left and provides better error messages.

variable "environment" {
  type = string

  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "instance_type" {
  type = string

  validation {
    condition     = can(regex("^t3\\.", var.instance_type))
    error_message = "Instance type must be from t3 family."
  }
}

These validations fail fast during terraform plan, saving you from deploying invalid configurations. Custom error messages help users understand exactly what went wrong.

CI/CD Integration

Automate Terraform in your deployment pipeline. This workflow runs on every PR and only applies changes on merge to main, providing code review for infrastructure changes.

# .github/workflows/terraform.yml
name: Terraform

on:
  push:
    branches: [main]
  pull_request:

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Terraform Init
        run: terraform init

      - name: Terraform Format
        run: terraform fmt -check

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        run: terraform plan -out=tfplan

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: terraform apply -auto-approve tfplan

The terraform fmt -check step enforces consistent formatting across your team. The -out=tfplan flag saves the plan to apply exactly what was reviewed, preventing race conditions where the infrastructure changes between plan and apply.

Common Patterns

Conditional Resources

Create resources only when certain conditions are met. The count meta-argument with a ternary expression is the standard pattern for this.

resource "aws_instance" "bastion" {
  count = var.enable_bastion ? 1 : 0

  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.micro"
}

When count = 0, no resource is created. Reference conditional resources as aws_instance.bastion[0] since they become a list. You can also use one() to convert the list to a single value or null.

For Each

Use for_each when you need to create multiple similar resources from a map. Unlike count, for_each uses meaningful keys instead of numeric indices.

variable "instances" {
  type = map(object({
    instance_type = string
    subnet_id     = string
  }))
}

resource "aws_instance" "web" {
  for_each = var.instances

  ami           = data.aws_ami.amazon_linux.id
  instance_type = each.value.instance_type
  subnet_id     = each.value.subnet_id

  tags = {
    Name = each.key
  }
}

With for_each, removing an item from the middle of your configuration does not cause other resources to be recreated, unlike count where indices would shift. This makes for_each the preferred approach for production resources.

Conclusion

Terraform brings software engineering practices to infrastructure management. Start with simple configurations, use modules for reusability, manage state remotely, and integrate into CI/CD. The initial learning curve pays dividends in reproducible, version-controlled infrastructure.

Infrastructure as Code with Terraform