Context Caching
A provider feature that stores a reusable prefix in memory to avoid re-processing repeated tokens.
Context caching (available on Anthropic and Google) lets you send a long static prefix — like a system prompt, large document, or codebase — once and reference it cheaply in subsequent requests. Cached tokens are billed at a fraction of standard input price (typically 10–25%) and cached reads at an even lower rate, yielding major savings for document-heavy workflows.
Termes Associés
The per-million-token cost charged for tokens in your prompt.
An instruction block sent before the conversation that configures model behavior.
The basic unit of text that language models process and are billed by.
Enhancing model responses by fetching relevant documents from an external knowledge base at query time.