A new framework, EVE-Agent, is introduced to address the issue of unreliable training signals in self-evolving search agents. By requiring all generated examples to include verifiable evidence spans, the agent can learn from evidence that genuinely improves correctness.
The research introduces EVE-Agent, a novel approach that mandates evidence verifiability for self-evolving agents. The core argument is that self-evolving agents trained on examples without verifiable evidence can generate fluent but unsupported learning signals. EVE-Agent modifies the proposer--solver framework so that agents generate not only an answer but also a source-grounded span. An evidence verifier then rewards these spans based on the marginal accuracy gain they provide. This mechanism produces a training signal that favors evidence that directly aids in solving the question, eliminating the need for external human annotations or oracle answers.
EVE-Agent leaves the underlying model and search tools unchanged but fundamentally alters the curriculum generation process. Experiments demonstrate that this approach substantially improves evidence-grounded correctness compared to previous self-evolving search agents, resulting in curricula that are auditable by construction.