One opinion, shown its working
I'm Alex Busse. I build software for a living and I use these models every day, so I got tired of leaderboards that rank everything and commit to nothing. This one commits.
How the index works
The llmbusse index is a weighted blend of the benchmarks that best predict real work, tilted toward agentic coding and hard reasoning over trivia. Weights are renormalised over whichever scores a model reports, so a missing number does not tank a model unfairly (though a model with gaps is, fairly, harder to trust).
- SWE-bench Verified30%
- GPQA Diamond25%
- LiveCodeBench (coding)15%
- AIME (math)15%
- MMLU-Pro15%
Disagree with the weighting? Good. The numbers are all on the model pages; re-rank them yourself.
Dated and sourced
Every figure is a snapshot as of July 2026, with a source on each model page. Models move fast and so do these numbers. If one is stale or wrong, it is wrong out loud, and I would rather fix it than pretend the ranking is eternal.
The rest of me
The paid work is at busse.com.au. The code and the write-ups are at alexbusse.com.