Sensitive information disclosure ranks #2 on the OWASP Top 10 for LLM Applications, and for good reason. When AI-powered applications inadvertently expose private data like personally identifiable information (PII), financial records, health information, API keys, or proprietary business intelligence, the consequences cascade quickly: regulatory violations, competitive disadvantage, and shattered user trust.
In this blog, we discuss the many ways AI data leaks can happen, where data is often linked to, and how a defense-in-depth approach can help protect your organization’s sensitive information.
Where AI Systems Spring Leaks
Application-Level Leakage
Contextualization techniques like retrieval-augmented generation (RAG) create significant exposure risks. When RAG systems pull information from vector databases or knowledge stores, they often bypass the original access controls that protected the data. This is what happens: Documents are chunked and embedded into vectors, but the metadata containing access control lists often gets stripped away or ignored during retrieval. The large language model (LLM) has no way to enforce who should see what. Document-level permissions get lost in translation, and suddenly the LLM is serving up information to users who should never see it. The model becomes an inadvertent privilege escalation vector.
Agentic AI systems compound these challenges. Unlike constrained LLM applications, AI agents can autonomously access multiple databases, APIs, and tools in real time based on user requests. An agent might query a customer database, call a payment processing API, access a document management system, and pull from a data warehouse all within a single conversation thread.
Each tool invocation represents a potential data leak, and the agent's decision-making process about which tools to use and what data to retrieve happens dynamically, making it nearly impossible to predict or audit data flows in advance. Even more concerning, agents often chain operations together. Data retrieved from one secure system might be passed to another tool or stored in shared context, creating unintended data commingling. When an agent has broad tool access, a single prompt injection or logic flaw can cascade into widespread data exposure across multiple systems that would normally be isolated from each other.
Training data presents another vulnerability. Models can memorize and regurgitate sensitive information from their training sets, and there's currently no reliable way to guarantee they won't. Add prompt injection attacks into the mix, and you have a system that can be manipulated into disclosing sensitive or private user information.
User-Introduced Leakage
While we often focus on technical vulnerabilities, users are another source of data exposure — and often, they don't even realize they're creating risk. The convenience of AI assistants can override security awareness, leading to dangerous patterns of oversharing.
Consider document processing scenarios: An employee asks an AI to summarize a quarterly report, not realizing it contains confidential financial projections that shouldn't leave the finance department. A developer may use a coding assistant to debug a script, inadvertently sharing source code with hardcoded API keys, access tokens, or proprietary algorithms that represent years of competitive advantage.
Customer service interactions present another common risk vector. Users often provide sensitive details to chatbots — Social Security numbers, credit card information, or account credentials — when simpler identifiers would suffice. They're accustomed to proving their identity to human agents and apply the same behavior to AI systems without considering where the data might end up.
Email integration features, while incredibly useful for productivity, can also lead to data exposure. When users grant AI assistants access to their inboxes, those systems suddenly have visibility into data such as protected health information (PHI), privileged attorney-client communications, confidential business negotiations, and sensitive HR matters. The AI doesn't understand the sensitivity context — it just sees text to process.
Where Data Gets Leaked
Once sensitive information enters an AI system, it can be leaked through multiple avenues:
- LLM responses to other users
- LLM responses that are reintegrated into downstream systems
- Agentic tools and API calls that expose data directly or reveal information through behavioral patterns
- Data source queries that telegraph what information the system is accessing
- Debug and audit logs that capture sensitive content in plaintext
- Context storage where sensitive data lands in databases without appropriate security controls
- Cross-user contamination when one user's sensitive data bleeds into another user's context
- Unintended use cases that violate regulatory compliance or data handling agreements
Plugging the Leaks: A Defense-in-Depth Approach
Model-Level Controls
Since LLMs fundamentally cannot keep secrets, training data should be carefully curated with the intended user in mind. Before expanding access to additional users, audit what information the model was trained on and whether or not certain users should have access to that data.
Systematic Data Protection
Implement multiple layers of defense across the AI pipeline:
- Identification and Classification: Automatically detect and tag sensitive information types (PII, PHI, financial data, credentials, proprietary information)
- Data Minimization: Block sensitive data at ingress points before it enters the system. The best way to prevent leakage is to never ingest the data in the first place.
- Sanitization: Cleanse data of sensitive elements while preserving the utility of that data for AI processing.
- Redaction: Mask or tokenize sensitive information in outputs, logs, and storage.
- Access Control: Enforce granular permissions that follow data through the entire AI pipeline, from retrieval to generation to storage.