Perplexity
A metric of how well a language model predicts a sample of text — lower is better.
Perplexity measures how surprised a model is by a held-out test set: it is the exponentiated average negative log-likelihood of the test tokens. Lower perplexity indicates better language modeling. It is primarily used to compare base models on raw language modeling quality, but it poorly predicts downstream task performance, which is why task-specific benchmarks like MMLU and HumanEval are preferred.
Termes Associés
Massive Multitask Language Understanding — a benchmark testing knowledge across 57 academic subjects.
Large Language Model — a neural network trained on vast text corpora to generate human-like text.
The initial large-scale training phase where a model learns language from massive text corpora.
The algorithm that converts raw text into a sequence of tokens for a language model.