ErrolSignal

OpenAI Blog · May 31, 2023

Improving mathematical reasoning with process supervision

Reviewed by Errol Vogt, Site support technician & online learning analyst · original summary · editorial policy

Improving mathematical reasoning with process supervision. We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”). In addition to boosting performance relative to outcome supervision, process supervision also has an important alignment benefit: it directly trains the model to produce a chain-of-thought that is endorsed by humans. This update is relevant for small-office operators tracking changes in their tools.

Operator takeaway: For operators: review whether 'Improving mathematical reasoning with process supervision' affects your current setup before relying on it in production.

ai

Read the original at OpenAI Blog →

Related updates

← All updates