$ cat choices/kubernetes.md

Kubernetes

the call

I reach for Kubernetes when the load is real and the cheapest path to stability is orchestration. I don't stand up a cluster to run three containers that a single box would happily hold.

why

At real load, Kubernetes is the cheapest path to stability. Not the flashiest. The cheapest. Self-healing, rolling deploys, declarative infra, and a scheduler that packs your machines so you’re not paying for idle. Once you’re past the point where a human can babysit every box, the orchestration earns its keep by making the boring failures boring.

when I don’t

Before you have the load, k8s is pure overhead. You inherit a control plane, a networking model, and a vocabulary of objects to maintain, all for an app a single VM would run without breaking a sweat. Standing up a cluster to feel like a real shop is résumé-driven development. If the train isn’t carrying enough weight to need the rails, you’re just maintaining track for fun. Earn it first.

in production

At CommercialTribe, the call was to migrate to GCP and Kubernetes. Not because it was the trendy thing, but because at that load it was genuinely the cheapest path to stability. The payoff showed up where it mattered: releases collapsed from weeks to hours. To keep the manifests from becoming their own swamp, I built psykube, a Crystal CLI that templated the k8s YAML so deploys stayed declarative and repeatable instead of hand-edited. Right tool, and only after the load justified it.— see: works / CommercialTribe

the principle under it

The judgment isn’t “is Kubernetes good.” It’s “have I earned it yet.” The train can only move as fast as the tracks it’s built on, but you don’t lay rail before there’s a train. Match the infrastructure to the actual weight on it. Adopting heavy tooling early feels like progress and is really just debt you took on before the payoff existed.

the gaps — what it costs even when it’s right

The cluster is its own product. Even when k8s is the right call, you now own a control plane, upgrade cycles, and a networking model. That’s real operational surface. Someone has to keep the rails maintained, not just ride them.

YAML sprawl is inevitable. Manifests multiply fast and drift faster. That’s exactly why psykube existed, but tooling to tame the config is itself more to maintain. The complexity doesn’t vanish, it moves.

The abstraction leaks under pressure. When something breaks at 2am, “it’s just containers” isn’t true. You’re debugging schedulers, network policies, and resource limits. The floor of knowledge to run it safely is higher than it looks on the demo.

used at

works / CommercialTribe open-source / psykube