Architecture

Transformer

The neural network architecture underlying virtually all modern LLMs.

Introduced in 'Attention Is All You Need' (2017), the Transformer replaced recurrent networks with self-attention layers that process all tokens in parallel. This enabled efficient training at scale and is the foundation of GPT, BERT, Claude, Gemini, and virtually every production LLM in use today.

Related Terms