Constitutional AI
Anthropic's method of training models to self-critique and revise outputs using a set of principles.
Constitutional AI (CAI) is Anthropic's alignment technique where the model is given a set of principles (a 'constitution') and trained to evaluate and revise its own outputs against those principles. This reduces the need for human labelers on harmful content and is a key part of how Claude models are made helpful, harmless, and honest.
Verwandte Begriffe
Reinforcement Learning from Human Feedback — training models to align with human preferences.
The field of ensuring AI systems behave according to human values and intentions.
Fine-tuning a model on examples of instructions paired with ideal responses.
The discipline of building AI systems that are reliably beneficial and avoid harmful outputs.