What I'm reading
Not a firehose. The trackers and analysis I actually trust, each with a one-line reason it earns the tab. Hand-curated, updated when something changes my mind.
- Artificial Analysis — independent model intelligence index artificialanalysis.ai
The independent numbers I sanity-check my own ranking against. When we disagree, one of us is measuring the wrong thing.
- LMArena — human-preference battle leaderboard lmarena.ai
Crowd preference, not capability. Useful for vibes and formatting, misleading if you read it as raw intelligence.
- SWE-bench — the software-engineering benchmark swebench.com
Still the closest thing to a real job interview for a coding model. Watch the Verified split, ignore the marketing numbers.
- Epoch AI — trends in compute, data, and capability epoch.ai
For the long arc rather than the launch-day noise. Good antidote to release hype.
- LiveCodeBench — contamination-resistant coding eval livecodebench.github.io
Because it pulls fresh contest problems, a high score here is harder to fake with memorised training data.
- LLM-Stats — broad model + pricing tracker llm-stats.com
The widest net for specs and prices. I cross-reference it, then form my own opinion about what the numbers mean.