Grundkonzepte

Throughput

The number of tokens or requests a model can process per second.

Throughput measures how many tokens an inference endpoint can generate per second across all concurrent users. High throughput is critical for batch processing jobs and high-traffic applications. It is often inversely correlated with model size — smaller models achieve higher throughput on the same hardware.

Verwandte Begriffe