Gemma 4 models, when paired with multi-token prediction (MTP) drafters utilizing speculative decoding, can achieve up to 3x faster inference speeds without compromising output quality.
Gemma 4 can be enhanced by pairing it with multi-token prediction (MTP) drafters. These drafters employ speculative decoding to generate multiple tokens in parallel. This parallel generation allows the model to verify the tokens in a single pass, resulting in up to approximately 3x faster inference time while maintaining the quality of the generated output.