Pick the right LLM for what you're shipping.

Every week brings new models, prices, and benchmarks. LLMReference tracks the field so you ship with the right model and provider, fast.

Coding RAG Agents Long context Vision Classification JSON / Tool use

Browse all models Compare two models

PicksProvidersAgentsPulse

Live shortlistAll picks →

Best overall todayDeepSeek V4 FlashCoding defaultResearched 29d ago#32 of 142 Q/$Freshest updateClaude Opus 5New or rechecked this weekResearched 6d ago2026-07-24

Default view: coding task, balanced budget, fresh research.

Editors' picks · featured

Where most teams start

See all 18

Developers

Coding

Claude Fable 5

Excellent

Anthropic's new flagship: 80.3% SWE-bench Pro, 96% SWE-bench Verified on Vals.ai, and 85.0% OSWorld-Verified make it the best production coding pick for non-trivial engineering tasks.

Claude Opus 5

Claude Opus 4.8

7 picks · 20 eligibleView all

Developers

Agents

Claude Sonnet 5

Excellent

Anthropic's new default agent: 85.2% SWE-bench Verified, 81.2% OSWorld-Verified, 86.6% BrowseComp multi-agent, and 1M context at the same durable $3/$15 as Sonnet 4.6 ($2/$10 intro through 2026-08-31).

Claude Opus 5

Claude Sonnet 4.6

7 picks · 23 eligibleView all

Knowledge workers

Writing

Claude Opus 4.7

Excellent

Tops Chatbot Arena (1503) and writes paragraphs you'd ship; understands tone notes and edits like a copy chief.

GPT-5.5

Gemini 3 Pro

4 picks · 7 eligibleView all

Knowledge workers

Research

Claude Fable 5

Excellent

GDPval-AA ELO 1932 and Anthropic-reported finance, trading, and analytics wins make it the strongest general knowledge-work pick; do not use Mythos-only HLE rows as Fable evidence.

Claude Opus 4.7

GPT-5.5

5 picks · 9 eligibleView all

Creatives