Tech Study Guide
ML Security and Privacy
ML security and privacy risks including prompt injection, data exfiltration, model supply chain, poisoned retrieval, tenant isolation, PII, retention, and audit logging.
ML Security and Privacy
ML systems add new ways to misuse ordinary infrastructure: prompts can carry instructions, retrieved documents can poison context, tools can perform side effects, and logs can capture sensitive data. Security controls need to surround the model, not depend on the model obeying policy.
Command Examples
date -Is
git diff --name-only
Example output and meaning:
| Command | Example output | What it does |
|---|---|---|
date -Is |
2026-06-06T10:24:33-07:00 |
Pins command output and logs to an exact incident timestamp. |
git diff --name-only |
prompts/support.yaml and evals/golden.jsonl |
Shows which prompt, eval, or policy artifacts changed in the release. |
For a real system, start by listing data classes, model artifacts, tool permissions, retrieval sources, logging paths, and retention rules.
Threat Matrix
| Threat | Example | Control |
|---|---|---|
| Prompt injection | Retrieved document says to ignore policy or reveal secrets. | Treat retrieved text as untrusted data; separate instructions from evidence. |
| Data exfiltration | Model includes private source text in answer to unauthorized user. | ACL filtering before prompt assembly and output review for sensitive data. |
| Tool abuse | Model calls delete, purchase, deploy, or send-message tools unexpectedly. | Deterministic authorization, approval gates, and idempotency. |
| Model supply chain | Untrusted checkpoint, tokenizer, adapter, or custom code. | Pin artifact hashes, avoid remote code, scan licenses and provenance. |
| Retrieval poisoning | Attacker inserts malicious or low-quality documents into RAG corpus. | Source allowlists, ingestion review, document trust scoring. |
| Training data leakage | Secrets or PII enter fine-tune or eval data. | Redaction, data classification, retention, and deletion workflows. |
| Tenant isolation failure | One tenant’s documents or adapters affect another tenant. | Tenant-scoped indexes, model routes, cache boundaries, and tests. |
Privacy Controls
| Area | Practical Control |
|---|---|
| Prompt logging | Minimize, redact, sample, encrypt, and expire logs. |
| Training data | Track consent, allowed use, deletion, and provenance. |
| RAG corpus | Store document ACLs and enforce them before retrieval context assembly. |
| Fine-tunes | Keep dataset manifests and remove disallowed examples before training. |
| Human review | Remove unnecessary PII and control reviewer access. |
| Model outputs | Detect sensitive data and provide appeal/escalation paths where needed. |
Security Release Gate
- Confirm model, tokenizer, adapter, and container provenance.
- Verify artifact hashes and license constraints.
- Run prompt-injection and data-exfiltration evals.
- Test tenant ACLs with positive and negative cases.
- Test tool authorization outside the model.
- Review logging, retention, redaction, and user-data handling.
- Confirm incident audit trails and rollback.
Study Cards
Why should retrieved RAG text be treated as untrusted?
It can contain prompt-injection instructions or poisoned content that tries to steer the model.
Where should tool authorization be enforced?
In deterministic tool hosts and policy layers outside the model.
Why is prompt logging a privacy risk?
Prompts and outputs can contain PII, secrets, customer data, or regulated content.