Introducing Co-ReAct, a novel framework that injects step-level rubrics into ReAct agents to guide their decision-making during inference, leading to more targeted and effective reasoning in complex, multi-step tasks.
Traditional ReAct agents often produce shallow or redundant trajectories because they rely solely on internal judgment for selecting the next action or search step. This paper introduces Co-ReAct, a rubric-guided action-selection framework designed to address these limitations. Co-ReAct guides the agent at each decision step by injecting a specific rubric into the context, specifying what the agent should target in evidence seeking, reasoning, or self-evaluation. To ensure the reliability of this guidance, the methodology includes training a dedicated rubric generator using GRPO, optimizing for multi-judge expert consensus rankings over standard preference formulations. Empirical results demonstrate that Co-ReAct consistently improves performance over baseline ReAct methods across search agents built on various base models (8B/14B open-source and closed-source), enhancing performance on benchmarks like DeepResearchBench and SQA-CS-V2.