$ cat choices/gitops.md
the call
GitOps is how I think continuous delivery should work: git is the single source of truth, and a controller reconciles the running system to match it. Deploys become merges, rollbacks become reverts, and the audit log writes itself. The split that makes it sing: TF for the building, Git for the lights.
Declare the desired state in git; let a controller reconcile reality to match it. That one move buys a stack of properties you otherwise fight for. The system is auditable (git log is the deploy history), reversible (git revert is the rollback), and self-correcting (drift between git and the cluster gets detected instead of festering). You stop asking “what’s actually running?” and start reading it.
The discipline that makes it work is splitting by rate of change. Slow-moving substrate (clusters, network, IAM, the shape of things) is provisioned with Terraform. Fast-moving payload (which image is live, what’s rolling out right now) is reconciled from git by a tool like Argo CD. The rule I keep coming back to: TF for the building, Git for the lights. If it changes per release, git owns it. If it changes per quarter, Terraform owns it.
GitOps earns its keep when there’s a reconcilable substrate worth reconciling: Kubernetes, real environments, more than one person shipping. Below that line it’s apparatus pretending to be rigor. A single small service needs a CI step that pushes the image, not a control plane. And the declarative model genuinely leaks. Secrets, database migrations, stateful cutovers, and anything you can’t canary (DNS, IAM, network) don’t fit “git is the desired state” cleanly. Cargo-culting the whole machine onto a system that didn’t need it is its own failure.
The model I build toward: the mainline branch is the release candidate. Every commit is proven-integrated by a merge train before it lands, so CI on mainline is theater and main is green by construction. The image is built once and promoted between environments by re-pointing an overlay, never rebuilt, so the bytes in prod are the exact bytes you proved in dev. A reconciler watches git and syncs; the release tag is a marker, not a deploy target; rollback is a revert. Even infrastructure rides the same gate. An infra change has to leave a real environment green before it’s allowed to merge, so dev is the canary for ops. Each layer ships in shadow before it gates anything.— see: choices / argo-cd · choices / terraform
The deploy path is a product. Make it boring, auditable, and reversible. Most release pain is two things with different rhythms welded into one pipeline; separate them by rate of change and give each the tool that fits its speed. The real win isn’t the tooling, it’s the property it forces: the system can’t get into a state nobody can explain or undo, because the explanation is the git history and the undo is a revert. Verification gates promotion, not the calendar.
the gaps — what it costs even when it’s right
The declarative model leaks. Secrets, migrations, stateful cutovers, and un-canaryable infra don’t fit “git is the desired state.” You end up bolting on sealed secrets, sync hooks, and out-of-band steps: real complexity the happy-path demos skip.
It’s a lot of apparatus before it pays. Reconcilers, canary controllers, a health-signal service, drift handling. Easy to build a gorgeous pipeline for a system that never needed it. The same “earn it” test that applies to Kubernetes applies here.
”Git is truth” only holds with discipline. One hand-run change and the reconciler either fights you or silently reverts your hotfix; the whole team has to go through the front door. And the integration gate only pays off with fast CI. Slow CI turns the train into a queue, so you invest in the tracks first.