Mimari

Mixture of Experts (MoE)

An architecture where only a subset of model parameters is activated per token.

In a Mixture of Experts model, the network is divided into many 'expert' sub-networks. A learned router selects which experts process each token, so only a fraction of total parameters are active at inference time. This allows MoE models to have very large total parameter counts while remaining computationally efficient. GPT-4 and Mixtral are widely believed to use MoE.

İlgili Terimler

Transformer

The neural network architecture underlying virtually all modern LLMs.

Parameter

A learnable weight in a neural network; model size is measured in billions of parameters.

Inference

The process of running a trained model to generate outputs from new inputs.

Quantization

Compressing model weights to lower numerical precision to reduce memory and speed up inference.