ErrolSignal

OpenAI Blog · Dec 3, 2025

How confessions can keep language models honest

Reviewed by Errol Vogt, Site support technician & online learning analyst · original summary · editorial policy

How confessions can keep language models honest. OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs. This update is relevant for small-office operators tracking changes in their tools.

Operator takeaway: For operators: review whether 'How confessions can keep language models honest' affects your current setup before relying on it in production.

ai

Read the original at OpenAI Blog →

Related updates

← All updates