A 22B-active mixture-of-experts model with Apache 2.0 weights, matching GPT-4-class quality.
Mistral has open-sourced Mixtral 200B, a sparse MoE with 22B active parameters per token under Apache 2.0. Internal benchmarks place it within 2 points of GPT-4-Turbo on MMLU-Pro and ahead on multilingual reasoning. Weights are on Hugging Face with vLLM and TGI support out of the box.