Multi-Tenant AI: Keeping One Customer's Context Out of Another's Answer
A thought experiment on the specific ways context can leak between tenants in a shared AI system, and why this failure mode is easy to miss until it's a real incident.
This one's speculative — a thought experiment about design principles, not a report on something I've built.
The leak nobody designs for on purpose
A multi-tenant AI system serving many customers from shared infrastructure has a failure mode traditional multi-tenant software mostly doesn't: a model's context window is a single shared surface, and anything left in it from a prior request is available to influence the next one unless something actively prevents it.
Caching is the usual culprit
The semantic and embedding caches that make an AI system fast and cheap are exactly the mechanism most likely to leak across tenants if the cache key doesn't include tenant identity as a hard partition, not just a soft signal. A cache tuned purely for hit rate, without a tenant boundary baked into the key itself, will happily serve one customer's cached answer to a different customer whose question merely sounded similar.
Shared fine-tuning is a slower version of the same problem
If a system ever fine-tunes or adapts a model using data drawn from multiple tenants without careful separation, the leak isn't a single bad response anymore — it's the model's underlying behavior quietly shaped by one customer's data in a way that surfaces, unpredictably, in another customer's outputs. That failure is much harder to detect and much harder to fix after the fact than a caching bug.
The fix is boring, and that's correct
Tenant identity has to be a first-class, unbypassable dimension in every cache key, every retrieval query, and every piece of context assembly — checked the same way access control is checked in any other multi-tenant system, not treated as an AI-specific afterthought. The interesting part of a multi-tenant AI system is the model. The part that actually has to be right is the boring isolation layer around it.
I'm Jesse Myers — Marine veteran, 32 years in enterprise IT, now building production AI systems. This site is where I write about what I've actually built, and occasionally about ideas I haven't built yet but think are worth taking seriously.