Mixture of Experts (MoE)
An architecture where only a subset of model parameters is activated per token.
In a Mixture of Experts model, the network is divided into many 'expert' sub-networks. A learned router selects which experts process each token, so only a fraction of total parameters are active at inference time. This allows MoE models to have very large total parameter counts while remaining computationally efficient. GPT-4 and Mixtral are widely believed to use MoE.
Términos Relacionados
The neural network architecture underlying virtually all modern LLMs.
A learnable weight in a neural network; model size is measured in billions of parameters.
The process of running a trained model to generate outputs from new inputs.
Compressing model weights to lower numerical precision to reduce memory and speed up inference.