2025-08-242 min read

Terraform Drift: How to Detect and Fix Infrastructure Inconsistency

TerraformIaCAutomation

You've automated your infrastructure with Terraform. Great! But what happens when someone makes a "quick fix" directly in the cloud console? That is Infrastructure Drift, and it's a silent killer of reliability and security.

Why Drift is Dangerous

Untested Changes: Direct changes bypass your CI/CD pipelines and tests.
Security Gaps: A manually opened firewall port might never be closed.
Failed Runs: Terraform might fail or produce unexpected results during the next planned update.

How to Detect Drift Automatically

Scheduled Plans: Run terraform plan on a schedule (e.g., every 2 hours) and alert if there are pending changes.
Cloud-Native Tools: Use AWS Config or Azure Policy to monitor resource changes in real-time.
IaC Platforms: Tools like Terraform Cloud, Spacelift, or Atlantis have built-in drift detection features.

A pragmatic drift workflow

Drift detection is only useful if it ends in a decision. A lightweight workflow:

Triage: is the drift expected (emergency change) or suspicious (unknown actor)?
Decide: reconcile in code vs revert via Terraform.
Document: link the drift event to a ticket and owner.
Close the loop: add a guardrail so the same drift doesn’t happen again (policy, RBAC, template).

For regulated environments, treat drift events like security signals: they should be visible and auditable.

Fixing the Drift

Once detected, you have two choices:

Reconcile: Update your Terraform code to match the manual change.
Revert: Re-run Terraform to overwrite the manual change and restore the intended state.

Preventing drift (better than detecting it)

make Terraform the default path: paved workflows, templates, and clear ownership
use break-glass access: time-bounded, audited, and visible
block high-risk console changes with policies (where possible)

Common anti-patterns

“temporary” console changes with no ticket, no owner, no expiry
shared admin accounts (no accountability)
ignoring drift alerts until the next incident

What to measure

number of drift events per week (should trend down)
mean time to resolve drift (triage → reconcile/revert)
% of changes going through Terraform vs console

Conclusion

Infrastructure as Code is only valuable if it's the only source of truth. Implementing automated drift detection ensures that your documentation (the code) always matches reality, reducing operational risk.

Want to go deeper on this topic?

Contact Demkada