Securing Locally Deployed Large Language Models (LLMs)

1. Project Context and Objectives

Client: Tom Broumels requested research into securing locally deployed LLMs (e.g., LLaMA, Mistral) that are network-accessible but not cloud-based.
Goal: Identify vulnerabilities, demonstrate attacks and mitigations, and deliver two Proofs of Concept (PoCs) — one vulnerable and one secure.
Focus: Generic, realistic threats rather than model-specific vulnerabilities.
Alignment: Security research and testing structured around NIST Cybersecurity Framework (CSF) and NIST SP 800-53 controls.

Prompt Injection Attacks: Crafting malicious prompts to cause harmful outputs or data leaks.
Overloading Attacks: Flooding LLM with excessive or large inputs to degrade or crash the system.
Prompt Leakage Attacks: Sensitive data included in prompts leaked in outputs to unauthorized users.
Cross-Site Scripting (XSS) Attacks: Malicious scripts embedded in prompts executed via generated HTML/JS outputs.
Direct API Access Attacks: Unauthorized access to backend API endpoints bypassing frontend controls.
Model Theft (Model Stealing): Unauthorized extraction of model behavior or parameters by repeated querying.
Poisoning the Well: Injecting biased or malicious training data to manipulate model outputs and behavior.

NIST CSF Core Functions:
- Identify: Map assets (LLM backend, APIs, data flows), risks (e.g., prompt injection).
- Protect: Access controls, input sanitization, blacklisting, sandboxing.
- Detect: Monitor login failures, suspicious prompt patterns.
- Respond: Use blacklists, isolate threats, recovery plans.
- Recover: Version control and backups for rollback.
NIST SP 800-53 Controls:
- Access Control (AC): Enforce authentication and authorization.
- System and Communications Protection (SC): Encrypt traffic, isolate LLM in containers.
- Audit and Accountability (AU): Log and monitor activities.
- Risk Assessment (RA): Threat modeling for input-based attacks.
- Configuration Management (CM): Use Git for version control.
- System Integrity (SI): Input validation, sandboxing, prevent command execution.