Architektur
Transformer
The neural network architecture underlying virtually all modern LLMs.
Introduced in 'Attention Is All You Need' (2017), the Transformer replaced recurrent networks with self-attention layers that process all tokens in parallel. This enabled efficient training at scale and is the foundation of GPT, BERT, Claude, Gemini, and virtually every production LLM in use today.
Verwandte Begriffe
Attention Mechanism
The core operation in transformers that lets each token attend to all other tokens.
LLM
Large Language Model — a neural network trained on vast text corpora to generate human-like text.
Parameter
A learnable weight in a neural network; model size is measured in billions of parameters.
Pre-Training
The initial large-scale training phase where a model learns language from massive text corpora.