False Positives and False Negatives Are Not Opposite Sides of the Same Coin

A thought experiment on why an AI detection or moderation system should almost never be tuned to minimize total errors, because the two error types usually have wildly different real costs.

This one's speculative — a thought experiment about design principles, not a report on something I've built.

Total accuracy is the wrong metric to optimize

It's tempting to tune a detection system to minimize the total error rate — false positives plus false negatives, treated as equally bad. That's almost never the right objective, because the two error types usually carry very different real-world costs, and a system optimized for the wrong balance can have an excellent overall accuracy number while still being badly miscalibrated for its actual purpose.

Name which error is worse before tuning anything

A fraud system that misses a real instance of fraud (false negative) and one that flags a legitimate transaction as fraud (false positive) fail in very different directions, with very different costs attached — one is a direct loss, the other is friction and lost trust. Before touching a threshold, the actual asymmetry has to be named explicitly: which failure is worse, and by roughly how much, for this specific system.

The threshold is a policy decision, not a math problem

Once the asymmetry is named, moving the detection threshold to favor the less costly error type is a legitimate design choice, not a compromise on accuracy — a system tuned to catch more real fraud at the cost of more false alarms is doing its actual job better than one that reports a marginally higher overall accuracy number while missing more of the failures that actually matter.

The cost of each error can change, and the system should notice

The right tuning for a detection system isn't fixed forever — it should shift if the business cost of one error type changes relative to the other, which means the threshold needs to be a deliberately owned, revisitable setting, not a value chosen once during development and left alone because nobody remembers why it was set where it was.

I'm Jesse Myers — Marine veteran, 32 years in enterprise IT, now building production AI systems. This site is where I write about what I've actually built, and occasionally about ideas I haven't built yet but think are worth taking seriously.