Introducing PathCal, a novel training-free decoding controller that calibrates reasoning paths by distinguishing the distinct functional roles of reflection markers used in Chain-of-Thought generation, leading to improved efficiency and accuracy.
Large Reasoning Language Models (LRMs) utilize reflection markers (like 'wait' or 'but') in their Chain-of-Thought trajectories to guide complex reasoning. This paper reveals that these markers possess distinct functional roles and exert different influences at various stages of reasoning. To leverage this discovery, we introduce PathCal, a novel, training-free decoding controller. PathCal estimates local competition between maintaining a trajectory and initiating a competing branch based on marker distributions and rebalances marker logits when uncertainty is high. Experiments across six reasoning benchmarks show that PathCal achieves a superior efficiency-performance trade-off, enhancing accuracy and reducing generation length without external verification.