Post

Mistral Models and Platform

Europe's leading foundation model company, headquartered in Paris, offering frontier-class open-weight and commercial models with native EU data residency -- the strongest option when EU compliance is a hard requirement.

Mistral Models and Platform

Europe’s leading foundation model company, headquartered in Paris, offering frontier-class open-weight and commercial models with native EU data residency – the strongest option when EU compliance is a hard requirement and you still need GPT-4-class capability.


Company Overview

Mistral AI was founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothee Lacroix (ex-Meta FAIR and DeepMind). Headquartered in Paris, France, the company has raised over EUR 1B in funding.

Why Mistral matters for EU enterprises:

  • French company subject to EU jurisdiction and GDPR natively
  • La Plateforme API runs on EU infrastructure by default
  • Open-weight models allow on-premises deployment with no data leaving your environment
  • Active participant in EU AI Act consultations

Model Lineup

Commercial (API-only) Models

Model Parameters Context API Cost (Input/Output per MTok)
Mistral Large (2) ~123B (MoE) 128k $2 / $6
Mistral Small (25.01) ~22B 32k $0.1 / $0.3
Codestral ~22B 32k $0.2 / $0.6
Pixtral Large ~124B (MoE) 128k $2 / $6
Mistral Embed Undisclosed 8k $0.1 / –

Open-Weight Models (Self-Hostable)

Model Parameters License Key Feature
Mistral 7B 7B Apache 2.0 First release; sliding window attention
Mixtral 8x7B 46.7B (MoE) Apache 2.0 Sparse MoE, matches GPT-3.5
Mixtral 8x22B 176B (MoE) Apache 2.0 Largest open MoE, competitive with GPT-4
Mistral Nemo 12B 12B Apache 2.0 Co-developed with NVIDIA, strong multilingual

La Plateforme (API)

Mistral’s managed API service, comparable to OpenAI’s API but EU-native.

Key features: EU-hosted by default, function calling, guardrails, fine-tuning, batch API, JSON mode. OpenAI-compatible endpoint available for drop-in replacement.


EU Data Residency Story

Direct API (La Plateforme)

  • Hosting in France (Scaleway, OVHcloud partnerships)
  • All inference happens within EU borders
  • GDPR: Native compliance as a French company
  • No US subpoena risk: Not subject to CLOUD Act or FISA 702

Azure AI Partnership

  • Mistral models available as Models-as-a-Service on Azure EU regions
  • Inherits Azure compliance certifications (ISO 27001, SOC2, BSI C5)

Google Cloud (Vertex AI)

  • Available via Vertex AI Model Garden in EU regions (Belgium, Frankfurt, Netherlands)

Self-Hosted (Open-Weight)

  • Download Mixtral 8x22B or Mistral Nemo, run on your own GPU cluster
  • Zero data residency risk

Benchmark Performance

Benchmark Mistral Large 2 GPT-4o Claude Sonnet 3.5
MMLU 84.0% 88.7% 88.7%
HumanEval (code) 92.1% 90.2% 92.0%
MATH 77.1% 76.6% 71.1%

Key takeaway: Mistral Large 2 is genuinely competitive with GPT-4o and Claude Sonnet on most benchmarks. It is not a “budget alternative” – it is a peer-class model with EU residency as a bonus.

Multilingual edge: Mistral models are trained with strong emphasis on European languages (French, German, Spanish, Italian, Portuguese, Dutch).


When to Choose Mistral

Use Mistral When

  • EU data residency is a hard requirement
  • German/European language workloads where multilingual training gives an edge
  • Cost optimization matters – Mistral Large at $2/$6 is significantly cheaper than GPT-4o at $5/$15
  • You want self-hosting flexibility – open-weight Mixtral models
  • Azure or GCP is your cloud – integrates into both as managed model
  • GDPR/AI Act compliance is under active legal scrutiny

Avoid Mistral When

  • You need the absolute best reasoning – GPT-4o and Claude Opus still edge ahead
  • 1M token context is required – Mistral Large tops out at 128k
  • You need a mature agent/tool ecosystem
  • Enterprise support/SLAs matter – Mistral’s offering is younger

Architecture Notes for Self-Hosting

Mixtral 8x22B

  • Active parameters: ~39B per forward pass (out of 176B total)
  • Inference hardware: 4x A100 80GB (FP16) or 2x H100 80GB (FP8)
  • Throughput: ~30-50 tokens/sec on 4x A100 with vLLM

Mistral Nemo 12B

  • Inference hardware: Single A100 40GB or even RTX 4090 (quantized)
  • Throughput: ~80-120 tokens/sec
  • Best for high-volume, cost-sensitive production workloads

References

This post is licensed under CC BY 4.0 by the author.