Ana Kavramlar
Inference
The process of running a trained model to generate outputs from new inputs.
Inference is what happens every time you call an AI API: the model takes your prompt and produces a completion. Unlike training, inference does not update the model's weights. It is the primary cost driver for production AI systems — you pay per token of inference, not for the model's training.
İlgili Terimler
Throughput
The number of tokens or requests a model can process per second.
Latency
The time between sending a request and receiving the first token of a response.
Streaming
Delivering model output token-by-token as it is generated rather than waiting for the full response.
Batch Processing
Submitting requests asynchronously in bulk for a 50% price discount.