Architecture

Transformer

The neural network architecture underlying virtually all modern LLMs.

Introduced in 'Attention Is All You Need' (2017), the Transformer replaced recurrent networks with self-attention layers that process all tokens in parallel. This enabled efficient training at scale and is the foundation of GPT, BERT, Claude, Gemini, and virtually every production LLM in use today.

Related Terms

Attention Mechanism

The core operation in transformers that lets each token attend to all other tokens.

LLM

Large Language Model — a neural network trained on vast text corpora to generate human-like text.

Parameter

A learnable weight in a neural network; model size is measured in billions of parameters.

Pre-Training

The initial large-scale training phase where a model learns language from massive text corpora.