A comprehensive review of reinforcement-learned agents - from RLHF descendants to self-play loops.
A 92-page survey from a Tsinghua-led team catalogs more than 200 papers on agentic reinforcement learning published in the last year. The authors taxonomize approaches into reward-modeled, self-play, and environment-grounded families and identify three open problems they argue dominate progress for 2026.