Anthropic · closed · 2026-05

Claude Opus 4.8

#7 index 86.2
the take

The one I actually reach for when the task is real code. Top of this list on agentic engineering, and the honesty upgrade is the part that matters: it is far less likely to quietly ship a broken patch and call it done. You pay for that on output.

benchmarks
specs
Context
1M
Input
$5/M
Output
$25/M
Speed
Modality
text, image
strengths
  • Best-in-class agentic coding for its window
  • Much better at flagging its own mistakes
  • 1M context, effort control, fast mode
weaknesses
  • $25 output is steep
  • Anthropic no longer reports GPQA/MMLU/SWE-Verified directly
  • High-effort settings burn tokens

Sources: [1][2][3]