Concepts Fondamentaux
Context Window
The maximum number of tokens a model can process in a single request.
The context window defines how much text — including your system prompt, conversation history, and documents — a model can 'see' at once. Larger context windows enable long-document analysis and extended conversations but typically cost more per call. Modern frontier models range from 8K to 2M tokens.
Termes Associés
Token
The basic unit of text that language models process and are billed by.
Input Price
The per-million-token cost charged for tokens in your prompt.
Streaming
Delivering model output token-by-token as it is generated rather than waiting for the full response.
RAG (Retrieval-Augmented Generation)
Enhancing model responses by fetching relevant documents from an external knowledge base at query time.