What to Log for an AI System That You'd Never Think to Log for Normal Software

A thought experiment on the specific observability gaps that only show up once a system's behavior depends on a model's output, not just its own code.

What to Log for an AI System That You'd Never Think to Log for Normal Software

This one's speculative — a thought experiment about design principles, not a report on something I've built.

Normal logging assumes deterministic behavior

Traditional software logging is built around an assumption that mostly holds: the same input, run twice, produces the same output, so a log of inputs and outputs is usually enough to reconstruct what happened. An AI system breaks that assumption quietly, since the same input can produce meaningfully different output on a different call, and a log built for deterministic software misses the parts that actually explain the difference.

Log the exact prompt, not just the template

It's not enough to log which prompt template was used — the actual, fully-assembled text sent to the model, with every piece of injected context, needs to be captured, because debugging a bad output almost always comes down to what specific context was present at that specific moment, not which named template was theoretically in use.

Log the model version like it's a dependency version, because it is

A model provider can update a model version behind an API that looks unchanged from the outside, and a system that doesn't log exactly which version handled a given request has no way to tell, months later, whether a behavior shift came from a prompt change, a data change, or a silent model update on the provider's side.

Log confidence and disagreement signals, not just final answers

If a system uses any form of internal confidence scoring, multi-source agreement, or adversarial review, those intermediate signals are worth logging in their own right, separately from the final answer — they're often the fastest way to spot a systematic problem developing before it shows up as a wrong answer a customer actually notices.


I'm Jesse Myers — Marine veteran, 32 years in enterprise IT, now building production AI systems. This site is where I write about what I've actually built, and occasionally about ideas I haven't built yet but think are worth taking seriously.