Securing Locally Deployed Large Language Models (LLMs)
1. Project Context and Objectives
- Client: Tom Broumels requested research into securing locally deployed LLMs (e.g., LLaMA, Mistral) that are network-accessible but not cloud-based.
- Goal: Identify vulnerabilities, demonstrate attacks and mitigations, and deliver two Proofs of Concept (PoCs) — one vulnerable and one secure.
- Focus: Generic, realistic threats rather than model-specific vulnerabilities.
- Alignment: Security research and testing structured around NIST Cybersecurity Framework (CSF) and NIST SP 800-53 controls.
2. Key Security Threats and Attack Types
- Prompt Injection Attacks: Crafting malicious prompts to cause harmful outputs or data leaks.
- Overloading Attacks: Flooding LLM with excessive or large inputs to degrade or crash the system.
- Prompt Leakage Attacks: Sensitive data included in prompts leaked in outputs to unauthorized users.
- Cross-Site Scripting (XSS) Attacks: Malicious scripts embedded in prompts executed via generated HTML/JS outputs.
- Direct API Access Attacks: Unauthorized access to backend API endpoints bypassing frontend controls.
- Model Theft (Model Stealing): Unauthorized extraction of model behavior or parameters by repeated querying.
- Poisoning the Well: Injecting biased or malicious training data to manipulate model outputs and behavior.
3. NIST Framework Application
- NIST CSF Core Functions:
- Identify: Map assets (LLM backend, APIs, data flows), risks (e.g., prompt injection).
- Protect: Access controls, input sanitization, blacklisting, sandboxing.
- Detect: Monitor login failures, suspicious prompt patterns.
- Respond: Use blacklists, isolate threats, recovery plans.
- Recover: Version control and backups for rollback.
- NIST SP 800-53 Controls:
- Access Control (AC): Enforce authentication and authorization.
- System and Communications Protection (SC): Encrypt traffic, isolate LLM in containers.
- Audit and Accountability (AU): Log and monitor activities.
- Risk Assessment (RA): Threat modeling for input-based attacks.
- Configuration Management (CM): Use Git for version control.
- System Integrity (SI): Input validation, sandboxing, prevent command execution.
4. Mitigation Strategies and Secure Design
- Input and output sanitization to filter dangerous commands and sensitive data.