1,775 language models · 137 providers · 237 labs
Pick the right LLM for what you're shipping.
Every week brings new models, prices, and benchmarks. LLMReference tracks the field so you ship with the right model and provider, fast.
Default view: coding task, balanced budget, fresh research.
Editors' picks · featured
Where most teams start
Coding
Anthropic's new flagship: 88.6% SWE-bench Verified, 69.2% SWE-bench Pro, and 74.6% Terminal-Bench 2.1 make it the cleanest production coding pick and migration target from Opus 4.7.
Agents
Best generally-available τ-bench (87.5); stays on-task across long tool loops and self-corrects without prompting.
Writing
Tops Chatbot Arena (1503) and writes paragraphs you'd ship; understands tone notes and edits like a copy chief.
Research
GPQA Diamond 94.2 (top GA) with the cleanest footnoted synthesis across many sources.
Image
The current photoreal leader — brand-consistent, with the best text rendering and hands in the open ecosystem.
Video
Best overall video quality in the catalog: 30-second clips, native audio, and up to 4K through Vertex AI.
Pulse · this week
What changed in the model market
Higgs Audio v3 TTS · Ideogram 4.0 · Reve 2.0
Verified provider price reductions
815 scores tracked across major suites
top-lab output · $/1M
Cheat sheet