Arquitectura

Reasoning Model

A model variant that produces explicit step-by-step thinking before answering.

Reasoning models (o1, o3, Claude 3.7 with extended thinking, Gemini Thinking) use chain-of-thought internally during inference, often generating thousands of 'thinking tokens' before producing a final answer. This dramatically improves accuracy on math, science, and logic but increases latency and cost. Thinking tokens may be billed at a discount or separately from output tokens.

Términos Relacionados

Chain of Thought

A prompting technique that instructs the model to reason step-by-step before answering.

Çıkarım (Inference)

Eğitilmiş bir yapay zeka modelinin yeni girdiler için çıktı üretme süreci.

Latency

The time between sending a request and receiving the first token of a response.

MMLU

Massive Multitask Language Understanding — a benchmark testing knowledge across 57 academic subjects.