Kriterler
MMLU
Massive Multitask Language Understanding — a benchmark testing knowledge across 57 academic subjects.
MMLU evaluates a model's factual knowledge and reasoning across 57 subjects including mathematics, history, medicine, law, and coding. It is one of the most widely cited benchmarks for comparing frontier models. Scores above 85% are considered expert-level. MMLU-Pro is a harder variant with more complex questions and no multiple-choice shortcuts.