Pricing & Costs

Batch Processing

Submitting requests asynchronously in bulk for a 50% price discount.

Batch APIs (offered by OpenAI and Anthropic) accept large files of requests and process them with higher latency — typically within 24 hours — in exchange for a 50% discount on both input and output prices. Batch processing is ideal for offline tasks like dataset annotation, document classification, and nightly report generation.

Related Terms

Input Price

The per-million-token cost charged for tokens in your prompt.

Output Price

The per-million-token cost charged for tokens the model generates.

Throughput

The number of tokens or requests a model can process per second.

Latency

The time between sending a request and receiving the first token of a response.