This research introduces a method to better quantify uncertainty in language model generation by analyzing the geometric trajectories of internal activation updates across layers, showing that these paths reveal where and how errors accumulate.
The study investigates limitations in current uncertainty quantification methods, such as the Maximum Softmax Probability (MSP), which are often miscalibrated. The researchers propose a novel method that probes the model's internal activations by tracing the cumulative path of per-layer MLP updates, extracting eleven scale-invariant geometric features. These features are then analyzed using a sparse linear probe to determine how errors shape and propagate across the model's depth. The resulting probe outperforms MSP under selective abstention, providing coefficients that trace exactly which layers commit prematurely or contradict the running state, offering a deeper understanding of model uncertainty.