$ cat choices/the-agent-fleet.md
the call
I run agents as a fleet, the volume layer, and keep a human as the judgment layer. Generation is free now; verification is the job. The fleet drafts, opens MRs, and scores its own work; the human decides what's true and what ships. I'll multiply great judgment by a fleet. I won't multiply zero by one.
A fleet flips the constraint. For a long time generation was the bottleneck: typing the code, the docs, the tests. That’s gone. The scarce thing now is verification, and that’s exactly where a human belongs. Point a fleet at the volume and keep a person on the map: what’s actually true, what’s actually wanted, what ships. Used that way, agents don’t replace judgment. They amplify it.
The trap is thinking the fleet is the answer. It isn’t. A fleet is a multiplier, and a multiplier obeys the math: anything times zero is still zero. Without the judgment to know what good looks like, a fleet just gets you to a beautifully-formatted disaster faster. More output, more confident, more wrong. I don’t hand the wheel to a fleet on a problem I couldn’t verify myself, and I never let one ship without passing the same gates a human’s work does. No map, no fleet.
This site is one example. Built by directing a fleet, agents holding the volume and me holding the map. In practice it’s not one model, it’s a stack routed by modality: Claude Code for code and writing, Gemini for image and video, MCPs to wire agents into real tools and data, ElevenLabs for voice and audio. And the fleet doesn’t get a special lane: it opens MRs against the same merge train, behind the same robust CI and the same human review as everyone else. The agents move fast; the gates and the human decide what “done” means. Multiply judgment by a fleet, never zero by a fleet.— see: choices / claude-code · writing / Volume Is Free Now
When a cost goes to zero, the value moves to whatever’s still scarce. Generation hit zero, so the work is now taste, verification, and direction: the things a fleet can’t do for you. Own the scarce layer. The leverage isn’t in producing more. It’s in being the one who can reliably tell whether the mountain of output is right, and in building the guardrails that catch what you can’t review by hand.
the gaps — what it costs even when it’s right
Verification doesn’t scale like generation. A fleet can produce ten times the work; one human can’t review ten times faster. The reviewer becomes the bottleneck, which is exactly why the gates (tests, CI, adversarial checks) have to do the verifying the human can’t.
Confident and wrong is the failure mode. Agents don’t flag their own gaps. They hand you polished output whether it’s right or not. The fluency is exactly what makes bad work easy to wave through.
Judgment can atrophy if you let it. Lean on the fleet for the thinking, not just the typing, and the very skill that makes the fleet worth anything starts to dull. You have to keep doing the hard part by hand sometimes.