Layer 06 of 8
◉
LLM (the model)
Weights · params · architecture
The brain. A giant pile of learned patterns that, given the conversation so far, picks the next word. Different models trade off four things: smart, fast, cheap, and small. In 2026, the frontier (Claude Opus 4.7, GPT-5, Gemini 3) is genuinely good at multi-step agentic work and writes code at a level most engineers find useful. A class below — Sonnet 4.6, GPT-5 mini, Gemini 3 Flash — is where most production traffic lives because it is roughly 5x cheaper and only modestly less capable. Open-weight options (Llama 4, Qwen, DeepSeek, Mistral) close the gap fast and run on your own hardware. The smallest phone-sized models do classification, routing, and basic generation locally with no network round-trip.
Examples in the wild
Claude Sonnet 4.6GPT-5Gemini 3Llama 4 (local)
Connects to
↑ Inference API↑ Context window↓ Inference engine↔ Fine-tuning