90 Days Solo With AI Agents: A Field Report on What Holds Up Under Load
Ninety days running a portfolio with AI agents as the primary execution layer. The patterns that held up, the patterns that broke, and the operating discipline the speed demands.
Sunday afternoon, I sat down to audit the last 90 days. Around four thousand commits across a few dozen repositories, six products in production, the rest in flight. The interesting line is not the volume. It is which patterns held up under load and which collapsed the moment the operating tempo doubled.
The setup: solo operator on a portfolio of products, AI agents as the primary execution layer. I write specs, take architectural decisions, run reviews, gate releases. The agents do line-level work. February 18 was the start of this operating model. May 18 is the first checkpoint with enough data to draw conclusions.
What follows is the field report.
What holds up under load
Structured handoffs between sessions. The single highest-leverage practice in the entire stack. Long-running work does not happen in a single session. It happens across dozens of sessions, sometimes spread across weeks. Without a structured handoff, a fixed format the next session reads before doing anything else, context evaporates and the work restarts from zero. With one, execution resumes mid-sentence. The handoff routine runs at the end of every task. It is not optional.
Persistent feedback memory. Every correction issued to an agent is persisted as a rule. Across a portfolio of products this compounds: the same class of error stops recurring across projects. The unit cost is a few seconds. The aggregate effect, three months in, is that the operation produces fewer of the same mistakes than a single human reviewer working without that scaffolding would.
One decision, applied everywhere. Cross-cutting concerns (the kind that touch every product) are decided once, deeply, and then applied uniformly. Deciding them product by product is the failure mode. I made this call twice in the quarter on architectural concerns. Both removed entire categories of future maintenance work. Both required one focused session.
Parallel sub-agents for independent subtasks. If three subtasks are genuinely independent, they run in parallel. The discipline is the upfront check, asking whether a dependency exists before launching. When it holds, the throughput gain is the obvious one. When it does not, sequential execution is the only safe choice and parallelism becomes a footgun.
Personal infrastructure before scale. The first month went disproportionately into scaffolding: handoffs, memory, deploy automation, status surfaces. It read as overhead in the moment. In retrospect it is the only reason the next two months produced output. The scaffolding is the multiplier on everything built on top of it. Inverting this order is a common and expensive mistake.
Tests in the same commit as the change. On the products where this rule held, the codebases ended the quarter with hundreds of tests and a low regression rate. On the products where it slipped early, the cost showed up later, every time, without exception. With agents in the loop the rule is more important, not less. Generated code without coverage is a liability accruing interest.
Where AI-agent workflows fail under load
Parallel front sprawl. Agents make it trivial to start new work. Finishing remains unchanged in cost. Without explicit discipline this produces a portfolio of half-built threads that tax attention every time the wrong directory is opened. The rule that holds: finish or formally park before starting anything new. Parking is a decision and has a documented exit condition. Half-states are neither.
Documentation drift. Operational documentation is the first thing under-invested in when agents accelerate output. Internal handoffs stay fresh because they are part of the workflow. User-facing and architectural documentation drifts quietly until a forced audit exposes the gap. The fix is to build the audit into the cadence, not the calendar. Once a quarter is the minimum.
Migration debt. Half-decided migrations compound. The old path keeps running, the new path sits incomplete, both demand maintenance. A migration without a committed deadline is not a migration. It is a second copy of the system the migration was supposed to retire. The only safe entry condition for a migration is a decision on when the old path is turned off.
Reviewing volume instead of substance. When the execution layer produces ten times the output, the temptation is to scale review proportionally. That is the wrong direction. Reviews tighten on architectural intent and security-relevant changes, and loosen on mechanical changes covered by tests. Trying to read everything line by line collapses within two weeks and produces neither quality nor speed.
The pattern underneath
AI agents do not solve focus. They expose its absence. When ten things can run in parallel, the bottleneck moves from "can I build this" to "should I build this right now." Speed without discipline produces debt faster than value, and the debt is harder to see because the dashboard looks healthy.
Q3 carries one rule forward: finish before starting. With agents in the loop, the cost of violating it is not a slow week. It is a portfolio of in-flight work that each requires context switching to resume, and a calendar that looks productive without producing outcomes.
If you are designing this operating model from scratch, build the persistence layer first. Memory, structured handoffs, feedback loops, audit cadence. The agents are fast. Keeping the operating discipline scaled to that speed is the actual work.
The takeaway
Ninety days in, the headline is not the throughput. It is that an AI-agent operation rewards discipline at a steeper curve than a traditional one. The teams that win on this stack will not be the ones with the best models. They will be the ones with the best operating habits.