Pre-Training
The initial large-scale training phase where a model learns language from massive text corpora.
Pre-training is the most compute-intensive phase of model development. The model is trained to predict the next token across trillions of tokens of internet text, books, and code. This instills broad world knowledge and language capability. Pre-training costs tens to hundreds of millions of dollars for frontier models and is done once by the provider.
Termes Associés
A large pre-trained model that serves as the base for many downstream applications.
Continuing to train a pre-trained model on domain-specific data to adapt its behavior.
Fine-tuning a model on examples of instructions paired with ideal responses.
A learnable weight in a neural network; model size is measured in billions of parameters.