OpenAI Blog · Sep 19, 2019

Fine-tuning GPT-2 from human preferences

Reviewed by Errol Vogt, Site support technician & online learning analyst · original summary · editorial policy

Fine-tuning GPT-2 from human preferences. We’ve fine-tuned the 774M parameter GPT-2 language model using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own. Specifically, for summarization tasks the labelers preferred sentences copied wholesale from the input (we’d only asked them to ensure accuracy), so our models learned to copy. Summarization required 60k human labels; simpler tasks which continue text in various styles require… This update is relevant for small-office operators tracking changes in their tools.

Operator takeaway: For operators: review whether 'Fine-tuning GPT-2 from human preferences' affects your current setup before relying on it in production.

phone

Read the original at OpenAI Blog →

Fine-tuning GPT-2 from human preferences

The 9 best AI voice generators

The 11 best data enrichment tools in 2026

Zapier survey: 92% of sales teams drop leads monthly