Reading Calibrated Uncertainty from Language Model Trajectories

The study investigates limitations in current uncertainty quantification methods, such as the Maximum Softmax Probability (MSP), which are often miscalibrated. The researchers propose a novel method that probes the model's internal activations by tracing the cumulative path of per-layer MLP updates, extracting eleven scale-invariant geometric features. These features are then analyzed using a sparse linear probe to determine how errors shape and propagate across the model's depth. The resulting probe outperforms MSP under selective abstention, providing coefficients that trace exactly which layers commit prematurely or contradict the running state, offering a deeper understanding of model uncertainty.

Reading Calibrated Uncertainty from Language Model Trajectories

More from this section

Reading Calibrated Uncertainty from Language Model Trajectories

More from this section