OpenAI Blog · Dec 3, 2025
How confessions can keep language models honest
Reviewed by Errol Vogt, Site support technician & online learning analyst · original summary · editorial policy
How confessions can keep language models honest. OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs. This update is relevant for small-office operators tracking changes in their tools.
Operator takeaway: For operators: review whether 'How confessions can keep language models honest' affects your current setup before relying on it in production.
ai