Architektur

Transformer

The neural network architecture underlying virtually all modern LLMs.

Introduced in 'Attention Is All You Need' (2017), the Transformer replaced recurrent networks with self-attention layers that process all tokens in parallel. This enabled efficient training at scale and is the foundation of GPT, BERT, Claude, Gemini, and virtually every production LLM in use today.

Verwandte Begriffe

Attention Mechanism

The core operation in transformers that lets each token attend to all other tokens.

LLM (Büyük Dil Modeli)

Büyük miktarda metin verisiyle eğitilmiş, insan benzeri metin üretebilen yapay sinir ağı.

Parameter

A learnable weight in a neural network; model size is measured in billions of parameters.

Pre-Training

The initial large-scale training phase where a model learns language from massive text corpora.