Tarification et Coûts

Context Caching

A provider feature that stores a reusable prefix in memory to avoid re-processing repeated tokens.

Context caching (available on Anthropic and Google) lets you send a long static prefix — like a system prompt, large document, or codebase — once and reference it cheaply in subsequent requests. Cached tokens are billed at a fraction of standard input price (typically 10–25%) and cached reads at an even lower rate, yielding major savings for document-heavy workflows.

Termes Associés

Input Price

The per-million-token cost charged for tokens in your prompt.

System Prompt

An instruction block sent before the conversation that configures model behavior.

Jeton (Token)

L'unité de base de texte que les modèles de langage traitent et facturent.

RAG (Geri Getirmeyle Güçlendirilmiş Üretim)

Modeli harici bir bilgi bankasından gelen gerçek zamanlı verilerle besleme tekniği.