Field guide · v2026.05 · Direction A · Rim-lit hovering city

The 2026 AI inference stack, one click deep.

Eight layers between a question and an answer. From the chat box on your screen all the way down to a GPU in a datacenter. Pick a layer to see what lives there, why it exists, and what matters in 2026.

L1User InterfaceChat · IDE · voice · API client8 termsL2Agent RuntimePlanner · loop control · sandbox6 termsL3Tools & MCPFunction calls · skills · MCP servers11 termsL4Context & MemorySystem prompt · RAG · vector DB · skills9 termsL5Inference APIAnthropic · OpenAI · Bedrock · Vertex · regional routing6 termsL6LLM (the model)Weights · params · architecture14 termsL7Inference enginevLLM · llama.cpp · TGI · MLX7 termsL8HardwareH200 · B200 · Apple Silicon · CPU9 terms