Two Bugs, One Sunday Morning: What Debugging Your AI Gateway Actually Teaches You
A plugin crash and a process fork bomb walked into a Sunday. What I learned about tool schemas, token limits, and why reading vendor specs matters.
I woke up this Sunday to my AI gateway throwing errors. Two different bugs, both self-inflicted, both avoidable in hindsight. Here is what happened and what I learned.
Bug 1: "Cannot read properties of undefined"
The moment I activated a plugin I had built for server management, the entire UI crashed. The error was classic JavaScript: Cannot read properties of undefined (reading 'properties').
The root cause turned out to be three separate mistakes stacked on top of each other in the same plugin:
Missing schemas. Every tool a plugin registers needs a parameters field with a JSON Schema definition. Mine had name, description, and an execute function, but no schema. When the UI tried to render the tool's parameter list, it accessed .parameters.properties on something that did not exist.
Silent passthrough failure. Even after adding schemas to the tool definitions, the registration function was not forwarding them. The buildToolset() function stripped the parameters field during transformation. The schemas existed in the source but never reached the gateway.
Wrong function name. The plugin API expects execute(). I was registering run(). The gateway stored the tool, the UI rendered it, but calling it produced tool.execute is not a function.
Three bugs. One crash. The fix was straightforward once I understood the contract: every tool needs name, description, parameters (with at least { type: "object", properties: {} }), and execute. No exceptions.
The lesson: When you build plugins for a system, do not assume the API contract. Read how existing, working plugins register their tools. I found the answer in 30 seconds by looking at a memory plugin that was already running fine. The schema, the function name, the passthrough - it was all right there.
Bug 2: The 8,192 Token Ceiling
This one was subtler. I had been routing requests to multiple AI models through a local proxy server. Everything worked for simple queries. But when the gateway tried to use a large model for complex tasks involving tool calls, the model started outputting raw XML instead of making actual function calls.
The model was not broken. It was being strangled.
Every model in my proxy configuration had maxTokens: 8192. The same value, copy-pasted across all entries. A model that can output 128,000 tokens was being told it could only produce 8,192. When a complex response with tool calls exceeded that limit, the output got truncated mid-JSON. The model, detecting the truncation, switched to a fallback behavior: dumping everything as plain text with fake XML tags instead of structured function calls.
The fix was embarrassingly simple. I looked up the actual vendor specifications:
- Claude Opus 4.6: 1M context, 128k output
- Gemini 2.5 Pro: 1M context, 65k output
- Claude Haiku 4.5: 200k context, 64k output
Corrected the numbers. Problem solved.
The lesson: Default values propagate. When you set up a new model entry by duplicating an existing one and changing the name, the old limits tag along silently. The system never complains because 8,192 is a valid number. It is just the wrong one.
The Bonus Bug: Process Fork Bomb
While fixing the first two bugs, I updated the gateway to a newer version. The new version introduced a third, unrelated problem: it spawned new worker processes in an infinite loop. Within 30 seconds, the server had 14 gateway processes and climbing. Load average hit 25.
The fix was rolling back to the previous version. Sometimes the correct response to a bug is "not my problem right now."
The lesson: Update one thing at a time. If I had updated the gateway before fixing the plugin, I would have been debugging three problems simultaneously with no idea which change caused which symptom.
What This Actually Teaches You
None of these bugs were exotic. Missing schemas, wrong token limits, a regression in an update. They are the kind of problems you hit when you run infrastructure that talks to multiple AI providers through plugins you build yourself.
But here is the thing: I now understand the tool registration contract, the token limit propagation, and the process architecture better than I did this morning. Not from reading documentation. From breaking it and fixing it.
That is how you actually learn a system. Not by reading the README. By having it blow up on a Sunday morning and making it work again before lunch.