You can get better code by exploiting model weights

If you haven't noticed lately, Claude seems to have no idea that it's a robot. I asked it recently to estimate how long a large refactor that it had planned out would take — an 11-phase architectural overhaul of a ~50k line TypeScript codebase — and it said, very confidently, that it would take seven weeks. Which is kind of ridiculous for a robot. It took six hours.

This is funny, but it's also kind of revealing. The model gave me a seven-week estimate because its weights are trained on a world where large refactors take seven weeks. It's seen thousands of project plans and sprint retrospectives and engineering blog posts written by humans, and all of that data says "11-phase refactor = months." The model doesn't know that it can just... do the thing. Its priors are calibrated to a species that isn't itself.

I've been spending a lot of time building agent-heavy systems lately and I keep bumping into this same phenomenon from different angles. The training data doesn't just shape how models estimate things, it shapes how they understand things. Models have spent their entire training absorbing an enormous amount of detailed, structured knowledge about well-documented human systems: the legal system, the military, medicine, finance, ecology, whatever. And when a model encounters something that maps onto one of these domains, it goes way beyond recognizing the vocabulary. It has genuine intuitions about how the system should behave, what can go wrong, and what the relationships between components ought to look like.

Like, if you tell a model that your software has a "judge" component, it already has opinions. It thinks the judge should be impartial. It thinks the judge should receive evidence from other parts of the system rather than gathering it directly. It thinks the judge's decisions should be final (or at least hard to override). It probably thinks there should be some kind of appeal mechanism. You didn't tell the model any of this. It just knows, because it knows what judges do.

This is a really easy way to get the model understanding your system better. If you're building something where agents are reading, writing, and reasoning about your code, you want those agents to have the strongest possible intuitions about what each component does. Calling your arbitration component a "judge" gives the model a massive head start and calling it a FlorbDecider gives it nothing.

I don't think this is far off from how you might get humans understanding your system better too. If you were explaining your legal-system-inspired software to a lawyer, you could say "and this is the judge, and it receives briefs from the advocates, and the clerk manages the docket." The lawyer would immediately understand your system at a structural level, without reading a single line of code. Try explaining the same system with "and this is the fleebelblorp, and it receives glorps from the bluebelglorps" and watch their eyes glaze over. Naming is always important in software but this takes it to a different level. You're mapping your entire architecture onto a domain that both humans and models already deeply understand.

We're moving into a world where models are increasingly the ones building and maintaining code. Their ability to intuit the purpose of a component from its name is now a genuine architectural advantage. And I think there's actually quite a lot of alpha in this right now. Today's models are really good at pattern-matching against well-known domains and significantly worse at reasoning about arbitrary systems from first principles (this is changing, but alpha is found in the back of the couch cushions!). If you structure your system around a metaphor that the model already deeply understands, you get a kind of free lunch where the model brings its existing knowledge of the domain to bear on your software without you having to explain everything from scratch.

Try this as a thought experiment. Pick a well-known system (law, religion, the military, whatever) and architect your software as a metaphor for that system. Now ask a model to predict how a component it hasn't seen yet should behave, based only on the component's name and the overall metaphor. I'm pretty confident the model will be surprisingly close to correct. It's drawing on its training data to fill in the gaps, and for well-documented domains, there are very few gaps to fill.

Of course there are limits. Metaphors break down. If your system does something that doesn't map cleanly onto the chosen domain, the metaphor becomes more of a hindrance than a help. The model starts bringing in assumptions that don't apply and you end up fighting its intuitions instead of leveraging them. You can work around this to some extent (ask the model what your system reminds it of, it's pretty good at finding structural similarities with well-known domains) but you also have to be comfortable changing the metaphor entirely if it stops working. If your system evolves in a direction that breaks the analogy, just restructure around a different domain. This is annoying but not catastrophic, especially if the agents are doing most of the refactoring anyway.

I suspect this whole trick has a shelf life. Models will keep getting better at reasoning about arbitrary systems, and eventually the gap between "model understands a well-known domain" and "model understands your weird custom thing" will close. But right now the gap is wide and the exploit is free. Enjoy!

-kf