Arquitectura

Multimodal

A model that can process and generate multiple types of data, such as text and images.

Multimodal models accept inputs beyond text — images, audio, video, or documents — and integrate them into a unified representation. GPT-4o, Gemini, and Claude 3 are multimodal: you can send an image alongside text and the model reasons across both. Multimodal inference typically costs more than text-only due to the additional tokens consumed by vision encoding.

Términos Relacionados

LLM (Büyük Dil Modeli)

Büyük miktarda metin verisiyle eğitilmiş, insan benzeri metin üretebilen yapay sinir ağı.

Token

Yapay zeka modellerinin metni işlemek ve faturalandırmak için kullandığı temel birim.

Foundation Model

A large pre-trained model that serves as the base for many downstream applications.

Çıkarım (Inference)

Eğitilmiş bir yapay zeka modelinin yeni girdiler için çıktı üretme süreci.