the current order // as of July 2026

Someone has to say which model is actually best.

So here it is: the frontier models ranked on the benchmarks that predict real work, plus my verdict on each. Dated, sourced, and happy to be wrong out loud.

  1. 01 Qwen3.7-Max
    88.9
  2. 02 GPT-5.2
    88.7
  3. 03 DeepSeek-V4-Pro
    88.1
  4. 04 Gemini 3.1 Pro
    87.7
  5. 05 Gemini 3 Pro
    86.5
#1 · the take

Qwen3.7-Max. Frontier scores across the board and hardly anyone in the West is talking about it. Two catches: it is closed weights despite Alibaba's open Qwen line, and it is a token furnace that inflates the real bill.

Full breakdown →
Leaderboard All 14 →
#ModelLicense
1Qwen3.7-MaxAlibaba (Qwen)
88.9
928097921M$2.5$7.5closed
2GPT-5.2OpenAI
88.7
9280100400K$1.75$14closed
3DeepSeek-V4-ProDeepSeek
88.1
90818895941M$0.435$0.87open
4Gemini 3.1 ProGoogle DeepMind
87.7
9481911M$2$12closed
5Gemini 3 ProGoogle DeepMind
86.5
927690951M$2$12closed
6Grok 4xAI
86.3
889279256K$3$15closed
7Claude Opus 4.8Anthropic
86.2
9489691M$5$25closed
8Gemini 3 FlashGoogle DeepMind
86.1
9078951M$0.5$3closed
Signal // curated All links →