Google DeepMind · closed · 2026-02

Gemini 3.1 Pro

Item: Gemini 3.1 Pro
Rating: 87.7

#4 index 87.7

the take
The reasoning ceiling of this whole list. Highest GPQA here, a genuinely multimodal million-token context, and it holds up on real coding. The long-context surcharge is the tax you pay for the ceiling.

benchmarks

80.6

94.3

91.0

—

LiveCodeBench (coding)

—

specs

Context: 1.048576M
Input: $2/M
Output: $12/M
Speed: 136.2 tok/s
Modality: text, image, audio, video

strengths

Top GPQA (94.3) and ARC-AGI-2
Strong SWE-bench (80.6)
Native text/image/audio/video, 1M context

weaknesses

2x long-context surcharge past 200K
Slower output than Flash tiers
Card omits AIME/MMLU-Pro

Sources: [1][2]

Back to leaderboard Battle Gemini 3.1 Pro