OpenAI Blog · Apr 18, 2018

Evolved Policy Gradients

Reviewed by Errol Vogt, Site support technician & online learning analyst · original summary · editorial policy

Evolved Policy Gradients. We’re releasing an experimental metalearning approach called Evolved Policy Gradients, a method that evolves the loss function of learning agents, which can enable fast training on novel tasks. Agents trained with EPG can succeed at basic tasks at test time that were outside their training regime, like learning to navigate to an object on a different side of the room from where it was placed during training. This update is relevant for small-office operators tracking changes in their tools.

Operator takeaway: For operators: review whether 'Evolved Policy Gradients' affects your current setup before relying on it in production.

ai phone

Read the original at OpenAI Blog →

Evolved Policy Gradients

Zapier SDK: Connect your code files to thousands of actions

How agents are transforming work

The 8 best AI presentation makers in 2026