Pentesting the AI stack (Part 1)

Artificial intelligence systems introduce new security exposures that do not exist in traditional software environments. While the underlying infrastructure still requires a standard penetration test, the layers above it—models, data, orchestration, and AI-driven applications—create additional attack surfaces that require specialized testing.

This document outlines the distinction between traditional and AI-specific attack vectors and provides a structured methodology for assessing each layer of the modern AI stack.

1. Infrastructure: Traditional Pentest Surface

The infrastructure layer behaves like any conventional cloud or container environment. Testing focuses on standard offensive security procedures:

container breakout attempts
privilege escalation
access to secrets and service tokens
misconfigured RBAC and IAM roles
exposed Kubernetes or Docker APIs
lateral movement across nodes
storage bucket access
network policy bypass
remote code execution attempts

Compromise of this layer provides full operational access to the systems that host upper layers. However, this leads to a critical distinction.

Infrastructure Compromise: Capability vs Effect

A successful infrastructure compromise does not automatically corrupt or control the AI layers above it. It provides capability, not immediate effect.

What an attacker gains:

the ability to access model weights
the ability to read or poison training data
the ability to alter vector stores or embeddings
the ability to modify orchestration logic
the ability to change prompt templates or application behavior
the ability to deploy malicious containers or agents

What does not happen automatically:

model outputs do not automatically change
training data is not automatically poisoned
orchestration logic remains unchanged until altered
application prompts stay intact
vector stores are not corrupted by default
model weights do not extract themselves

The attacker must choose and execute follow-on actions: extraction, poisoning, manipulation, persistence, or corruption.

Infrastructure compromise grants total privilege, but no AI-specific effects occur without further steps.

This distinction matters when designing pentest scope and incident-response playbooks.

2. Model Layer: AI-Driven Attack Surface

The model layer introduces attack techniques that target the system’s reasoning, safety mechanisms, or inference engines.

Key attack methods include:

jailbreaks and policy bypass
prompt injection
model weight exposure attempts
adversarial inputs
fine-tuning or checkpoint corruption
unsafe inference server endpoints
malicious tokenizer or model loader manipulation
model backdooring

Pentesting this layer determines whether the model can be coerced into executing unsafe behavior or leaking restricted information.

3. Data Layer: RAG, Embeddings, and Poisoning

AI systems rely on data pipelines, vector databases, and retrieval systems. These components introduce new security risks:

data poisoning
malicious document injection
ingestion pipeline exploitation
embedding manipulation
index corruption
unauthorized retrieval of documents or vectors
knowledge extraction through iterative model queries

Attackers can influence model output by inserting targeted data into the system. This layer requires testing for both data integrity and retrieval control.

4. Orchestration Layer: Agent and Tool-Call Control

Orchestration is the most critical AI-specific attack surface. It controls how models interact with tools, workflows, memory, and external systems.

Key attack vectors:

forcing unauthorized tool calls
manipulating agent reasoning or memory
triggering unintended workflow branches
SSRF or RCE via unsafe tool integrations
bypassing safety logic in multi-step chains

-injecting persistent malicious state into agents

Pentesting this layer determines whether an attacker can weaponize the system by exploiting automation behavior rather than infrastructure vulnerabilities.

5. Application Layer: Hybrid Traditional + AI Exploitation

AI-driven applications combine conventional web exposure with AI-specific manipulation.

Relevant techniques include:

input-based prompt injection
XSS through AI-generated outputs
JSON/XML injection
authentication or authorization bypass via AI misinterpretation
workflow abuse
content-based data leakage
exploiting hallucinations or misalignment

Testing focuses on how AI influences application logic and how user-supplied input propagates through the system.

Pentest Methodology Per Layer

Infrastructure – Compute, Storage, Networking

Method: Full traditional penetration test Objective: RCE, privilege escalation, lateral movement Techniques: container breakout, exposed APIs, misconfigured IAM, network traversal

Model – Foundation Models & Inference Servers

Method: Model exploitation and behavior manipulation Objective: bypass controls, compromise inference server, corrupt model behavior Techniques: jailbreaks, prompt injection, unsafe model loaders, fine-tune poisoning

Data – RAG & Embeddings

Method: Retrieval poisoning and data manipulation Objective: influence output, leak internal knowledge Techniques: vector injection, embedding manipulation, malicious ingestion

Orchestration – Agents & Tools

Method: Agent and tool-control exploitation Objective: trigger unauthorized action via LLM Techniques: forced tool-calls, agent memory poisoning, workflow bypass

Application – UI, API, Integrations

Method: Hybrid web + AI logic testing Objective: misuse application functionality Techniques: output injection, AI-driven bypass, structured input exploitation

Attacker Priorities: Realistic Ranking

Most impactful to least:

RCE – complete system compromise
Persistence – long-term foothold
Training Data Poisoning – lasting behavioral impact
Orchestration Manipulation – turning the AI into an attack tool
Output Manipulation – direct effect on users
Sensitive Data Extraction – embeddings, documents, memory
Model Weight Theft – only critical when weights are core IP

Model theft is overvalued; control and persistence matter more in real attacks.

Traditional vs AI-Specific Attack Surfaces

Infrastructure: traditional security testing

Models, Data, Orchestration: AI-native attack surfaces

Applications: hybrid of both domains

Security programs must treat each layer as a separate target environment.

Conclusion

The AI stack demands a combined testing approach: traditional penetration testing for the base infrastructure and specialized offensive testing for AI behavior, data integrity, and orchestration logic.

Infrastructure compromise grants attackers the capability to control the entire system, but the effects—model corruption, data poisoning, prompt manipulation, orchestration hijacking—require deliberate follow-on actions.

AI-specific attack surfaces must be validated with the same rigor traditionally applied to operating systems, containers, and networks.

[Original Source](No response)