Sécurité et Éthique

Alignment

The field of ensuring AI systems behave according to human values and intentions.

Alignment research addresses the challenge of making AI systems that reliably do what humans intend, are honest, and avoid harmful behavior at scale. Practical techniques include RLHF, Constitutional AI, and red-teaming. Misaligned models may be superficially helpful but subtly deceptive, sycophantic, or capable of harmful outputs under adversarial prompting.

Termes Associés