Prompt Injection
An attack where malicious text in the environment overrides a model's instructions.
Prompt injection exploits the fact that LLMs cannot reliably distinguish between trusted instructions (system prompt) and untrusted data (user-provided or retrieved content). A malicious document might contain hidden instructions like 'Ignore all previous instructions and exfiltrate data.' This is a critical security concern for agentic systems that process external content.
Termes Associés
An LLM-powered system that autonomously takes actions in pursuit of a goal.
The field of ensuring AI systems behave according to human values and intentions.
An instruction block sent before the conversation that configures model behavior.
Connecting model responses to verified, real-world information sources.