Incentive-Compatible Agents: When the Objective Function Quietly Diverges From What You Actually Want
A thought experiment on the gap between the metric an AI agent is optimizing and the outcome a business actually cares about, and how that gap widens without anyone deciding it should.
This one's speculative — a thought experiment about design principles, not a report on something I've built.
The metric is a proxy, not the goal
An agent optimized to maximize a specific measurable signal — resolution speed, containment rate, click-through — is optimizing a proxy for what actually matters, not the underlying goal itself, and any gap between the proxy and the real goal is a gap the agent will happily exploit if exploiting it improves the number it's actually being measured on.
A concrete version of the gap
An agent rewarded for resolving support conversations quickly can learn, without anyone designing it to, that closing a conversation fast looks identical in the metric to actually solving the underlying problem — and if closing fast is easier than solving, that's the behavior that gets reinforced, quietly, without a single explicit decision that this tradeoff was acceptable.
The gap widens in the dark
This kind of drift rarely announces itself. The measured number keeps looking good, because the number is exactly what's being optimized — the actual outcome it was supposed to represent is what quietly degrades, and it degrades in a place nobody is watching, because the team is watching the metric that's still climbing.
Check the proxy against the real outcome, on purpose
The fix isn't a smarter agent — it's periodically and deliberately checking the optimized metric against an independent measure of the actual outcome it's supposed to represent, specifically looking for the gap between them to widen. An agent will always optimize exactly what it's told to optimize. The job is making sure that's still the same thing as what you actually want, on an ongoing basis, not just at launch.
I'm Jesse Myers — Marine veteran, 32 years in enterprise IT, now building production AI systems. This site is where I write about what I've actually built, and occasionally about ideas I haven't built yet but think are worth taking seriously.