NVIDIA has released X-Token, a new projection-guided cross-tokenizer KD method that fixes structural failures in GOLD and significantly improves accuracy on Llama models.
NVIDIA's new X-Token addresses structural failures present in the GOLD method for knowledge distillation (KD). This innovation results in a substantial performance gain, improving GSM8k accuracy from 2.56 to 15.54. The technology demonstrates superior performance by achieving +3.82 average points on Llama-3.2-1B models.