Co-ReAct: Using Rubrics as Step-Level Guides to Enhance Agent Reasoning

Traditional ReAct agents often produce shallow or redundant trajectories because they rely solely on internal judgment for selecting the next action or search step. This paper introduces Co-ReAct, a rubric-guided action-selection framework designed to address these limitations. Co-ReAct guides the agent at each decision step by injecting a specific rubric into the context, specifying what the agent should target in evidence seeking, reasoning, or self-evaluation. To ensure the reliability of this guidance, the methodology includes training a dedicated rubric generator using GRPO, optimizing for multi-judge expert consensus rankings over standard preference formulations. Empirical results demonstrate that Co-ReAct consistently improves performance over baseline ReAct methods across search agents built on various base models (8B/14B open-source and closed-source), enhancing performance on benchmarks like DeepResearchBench and SQA-CS-V2.

Co-ReAct: Using Rubrics as Step-Level Guides to Enhance Agent Reasoning

More from this section

Co-ReAct: Using Rubrics as Step-Level Guides to Enhance Agent Reasoning

More from this section