1,727 models · 132 providers · 235 labs
Pick the right LLM for what you're shipping.
Every week brings new models, prices, and benchmarks. LLMReference tracks the field so you ship with the right model and provider, fast.
Default view: coding task, balanced budget, fresh research.
Editors' picks · featured
Where most teams start
Coding
Leads both SWE-bench Verified (87.6) and SWE-bench Pro (64.3) and tops Chatbot Arena; the surest hand on a real PR.
Agents
Best generally-available τ-bench (87.5); stays on-task across long tool loops and self-corrects without prompting.
Writing
Tops Chatbot Arena (1503) and writes paragraphs you'd ship; understands tone notes and edits like a copy chief.
Research
GPQA Diamond 94.2 (top GA) with the cleanest footnoted synthesis across many sources.
Image
The current photoreal leader — brand-consistent, with the best text rendering and hands in the open ecosystem.
Video
Best overall video quality in the catalog: 30-second clips, native audio, and up to 4K through Vertex AI.
Pulse · this week
What changed in the model market
Command A+ · Qwen3.7-Max · Gemini 3.5 Flash
Verified provider price reductions
718 scores tracked across major suites
top-lab output · $/1M
Cheat sheet