Introduces DART, a modular runtime designed to solve the dilemma of restoring failed tool agent executions while preserving committed downstream work, addressing the gap between mechanical rollback and semantic validity.
When a structured tool agent fails during execution, a runtime dilemma arises: replaying the task is wasteful, while local checkpoint restoration risks invalidating committed downstream work. This tension is critical in commitment-sensitive settings. Existing recovery methods lack a criterion for semantic validity after downstream commitment.
This paper formalizes this gap as 'semantic recoverability' and introduces DART, a modular runtime designed to address it. DART localizes the failed instance, certifies semantically recoverable boundaries, aligns checkpoints, and selects an admissible restore point that successfully preserves committed downstream work under dependency and effect constraints.
Through validation across three LLM-driven domains and a LangGraph substrate, DART correctly recovers all evaluated commitment-sensitive cases where baseline local recovery fails. A safety audit confirms that the proposed rollbacks are entirely safe, demonstrating that controller legality does not inherently imply semantic validity, necessitating an explicit admissibility check for sound local recovery.