Artificial intelligence systems introduce new security exposures that do not exist in traditional software environments. While the underlying infrastructure still requires a standard penetration test, the layers above it—models, data, orchestration, and AI-driven applications—create additional attack surfaces that require specialized testing.
This document outlines the distinction between traditional and AI-specific attack vectors and provides a structured methodology for assessing each layer of the modern AI stack.
1. Infrastructure: Traditional Pentest Surface
The infrastructure layer behaves like any conventional cloud or container environment. Testing focuses on standard offensive security procedures:
-
container breakout attempts
-
privilege escalation
-
access to secrets and service tokens
-
misconfigured RBAC and IAM roles
-
exposed Kubernetes or Docker APIs
-
lateral movement across nodes
-
storage bucket access
-
network policy bypass
-
remote code execution attempts
Compromise of this layer provides full operational access to the systems that host upper layers. However, this leads to a critical distinction.
Infrastructure Compromise: Capability vs Effect
A successful infrastructure compromise does not automatically corrupt or control the AI layers above it. It provides capability, not immediate effect.
What an attacker gains:
-
the ability to access model weights
-
the ability to read or poison training data
-
the ability to alter vector stores or embeddings
-
the ability to modify orchestration logic
-
the ability to change prompt templates or application behavior
-
the ability to deploy malicious containers or agents
What does not happen automatically:
-
model outputs do not automatically change
-
training data is not automatically poisoned
-
orchestration logic remains unchanged until altered
-
application prompts stay intact
-
vector stores are not corrupted by default
-
model weights do not extract themselves
The attacker must choose and execute follow-on actions: extraction, poisoning, manipulation, persistence, or corruption.
Infrastructure compromise grants total privilege, but no AI-specific effects occur without further steps.
This distinction matters when designing pentest scope and incident-response playbooks.
2. Model Layer: AI-Driven Attack Surface
The model layer introduces attack techniques that target the system’s reasoning, safety mechanisms, or inference engines.
Key attack methods include:
-
jailbreaks and policy bypass
-
prompt injection
-
model weight exposure attempts
-
adversarial inputs
-
fine-tuning or checkpoint corruption
-
unsafe inference server endpoints
-
malicious tokenizer or model loader manipulation
-
model backdooring
Pentesting this layer determines whether the model can be coerced into executing unsafe behavior or leaking restricted information.
3. Data Layer: RAG, Embeddings, and Poisoning
AI systems rely on data pipelines, vector databases, and retrieval systems. These components introduce new security risks:
-
data poisoning
-
malicious document injection
-
ingestion pipeline exploitation
-
embedding manipulation
-
index corruption
-
unauthorized retrieval of documents or vectors
-
knowledge extraction through iterative model queries
Attackers can influence model output by inserting targeted data into the system. This layer requires testing for both data integrity and retrieval control.
4. Orchestration Layer: Agent and Tool-Call Control
Orchestration is the most critical AI-specific attack surface. It controls how models interact with tools, workflows, memory, and external systems.
Key attack vectors:
-
forcing unauthorized tool calls
-
manipulating agent reasoning or memory
-
triggering unintended workflow branches
-
SSRF or RCE via unsafe tool integrations
-
bypassing safety logic in multi-step chains
-injecting persistent malicious state into agents
Pentesting this layer determines whether an attacker can weaponize the system by exploiting automation behavior rather than infrastructure vulnerabilities.
5. Application Layer: Hybrid Traditional + AI Exploitation
AI-driven applications combine conventional web exposure with AI-specific manipulation.
Relevant techniques include:
-
input-based prompt injection
-
XSS through AI-generated outputs
-
JSON/XML injection
-
authentication or authorization bypass via AI misinterpretation
-
workflow abuse
-
content-based data leakage
-
exploiting hallucinations or misalignment
Testing focuses on how AI influences application logic and how user-supplied input propagates through the system.
Pentest Methodology Per Layer
Infrastructure – Compute, Storage, Networking
Method: Full traditional penetration test Objective: RCE, privilege escalation, lateral movement Techniques: container breakout, exposed APIs, misconfigured IAM, network traversal
Model – Foundation Models & Inference Servers
Method: Model exploitation and behavior manipulation Objective: bypass controls, compromise inference server, corrupt model behavior Techniques: jailbreaks, prompt injection, unsafe model loaders, fine-tune poisoning
Data – RAG & Embeddings
Method: Retrieval poisoning and data manipulation Objective: influence output, leak internal knowledge Techniques: vector injection, embedding manipulation, malicious ingestion
Orchestration – Agents & Tools
Method: Agent and tool-control exploitation Objective: trigger unauthorized action via LLM Techniques: forced tool-calls, agent memory poisoning, workflow bypass
Application – UI, API, Integrations
Method: Hybrid web + AI logic testing Objective: misuse application functionality Techniques: output injection, AI-driven bypass, structured input exploitation
Attacker Priorities: Realistic Ranking
Most impactful to least:
-
RCE – complete system compromise
-
Persistence – long-term foothold
-
Training Data Poisoning – lasting behavioral impact
-
Orchestration Manipulation – turning the AI into an attack tool
-
Output Manipulation – direct effect on users
-
Sensitive Data Extraction – embeddings, documents, memory
-
Model Weight Theft – only critical when weights are core IP
Model theft is overvalued; control and persistence matter more in real attacks.
Traditional vs AI-Specific Attack Surfaces
Infrastructure: traditional security testing
Models, Data, Orchestration: AI-native attack surfaces
Applications: hybrid of both domains
Security programs must treat each layer as a separate target environment.
Conclusion
The AI stack demands a combined testing approach: traditional penetration testing for the base infrastructure and specialized offensive testing for AI behavior, data integrity, and orchestration logic.
Infrastructure compromise grants attackers the capability to control the entire system, but the effects—model corruption, data poisoning, prompt manipulation, orchestration hijacking—require deliberate follow-on actions.
AI-specific attack surfaces must be validated with the same rigor traditionally applied to operating systems, containers, and networks.
[Original Source](No response)