Arquitectura

Mixture of Experts (MoE)

An architecture where only a subset of model parameters is activated per token.

In a Mixture of Experts model, the network is divided into many 'expert' sub-networks. A learned router selects which experts process each token, so only a fraction of total parameters are active at inference time. This allows MoE models to have very large total parameter counts while remaining computationally efficient. GPT-4 and Mixtral are widely believed to use MoE.

Términos Relacionados

Transformer

The neural network architecture underlying virtually all modern LLMs.

Parameter

A learnable weight in a neural network; model size is measured in billions of parameters.

Çıkarım (Inference)

Eğitilmiş bir yapay zeka modelinin yeni girdiler için çıktı üretme süreci.

Quantization

Compressing model weights to lower numerical precision to reduce memory and speed up inference.