Artificial intelligence systems introduce new security exposures that do not exist in traditional software environments. While the underlying infrastructure still requires a standard penetration test, the layers above it—models, data, orchestration, and AI-driven applications—create additional attack surfaces that require specialized testing.

This document outlines the distinction between traditional and AI-specific attack vectors and provides a structured methodology for assessing each layer of the modern AI stack.


1. Infrastructure: Traditional Pentest Surface

The infrastructure layer behaves like any conventional cloud or container environment. Testing focuses on standard offensive security procedures:

  • container breakout attempts

  • privilege escalation

  • access to secrets and service tokens

  • misconfigured RBAC and IAM roles

  • exposed Kubernetes or Docker APIs

  • lateral movement across nodes

  • storage bucket access

  • network policy bypass

  • remote code execution attempts

Compromise of this layer provides full operational access to the systems that host upper layers. However, this leads to a critical distinction.


Infrastructure Compromise: Capability vs Effect

A successful infrastructure compromise does not automatically corrupt or control the AI layers above it. It provides capability, not immediate effect.

What an attacker gains:

  • the ability to access model weights

  • the ability to read or poison training data

  • the ability to alter vector stores or embeddings

  • the ability to modify orchestration logic

  • the ability to change prompt templates or application behavior

  • the ability to deploy malicious containers or agents

What does not happen automatically:

  • model outputs do not automatically change

  • training data is not automatically poisoned

  • orchestration logic remains unchanged until altered

  • application prompts stay intact

  • vector stores are not corrupted by default

  • model weights do not extract themselves

The attacker must choose and execute follow-on actions: extraction, poisoning, manipulation, persistence, or corruption.

Infrastructure compromise grants total privilege, but no AI-specific effects occur without further steps.

This distinction matters when designing pentest scope and incident-response playbooks.


2. Model Layer: AI-Driven Attack Surface

The model layer introduces attack techniques that target the system’s reasoning, safety mechanisms, or inference engines.

Key attack methods include:

  • jailbreaks and policy bypass

  • prompt injection

  • model weight exposure attempts

  • adversarial inputs

  • fine-tuning or checkpoint corruption

  • unsafe inference server endpoints

  • malicious tokenizer or model loader manipulation

  • model backdooring

Pentesting this layer determines whether the model can be coerced into executing unsafe behavior or leaking restricted information.


3. Data Layer: RAG, Embeddings, and Poisoning

AI systems rely on data pipelines, vector databases, and retrieval systems. These components introduce new security risks:

  • data poisoning

  • malicious document injection

  • ingestion pipeline exploitation

  • embedding manipulation

  • index corruption

  • unauthorized retrieval of documents or vectors

  • knowledge extraction through iterative model queries

Attackers can influence model output by inserting targeted data into the system. This layer requires testing for both data integrity and retrieval control.


4. Orchestration Layer: Agent and Tool-Call Control

Orchestration is the most critical AI-specific attack surface. It controls how models interact with tools, workflows, memory, and external systems.

Key attack vectors:

  • forcing unauthorized tool calls

  • manipulating agent reasoning or memory

  • triggering unintended workflow branches

  • SSRF or RCE via unsafe tool integrations

  • bypassing safety logic in multi-step chains

-injecting persistent malicious state into agents

Pentesting this layer determines whether an attacker can weaponize the system by exploiting automation behavior rather than infrastructure vulnerabilities.


5. Application Layer: Hybrid Traditional + AI Exploitation

AI-driven applications combine conventional web exposure with AI-specific manipulation.

Relevant techniques include:

  • input-based prompt injection

  • XSS through AI-generated outputs

  • JSON/XML injection

  • authentication or authorization bypass via AI misinterpretation

  • workflow abuse

  • content-based data leakage

  • exploiting hallucinations or misalignment

Testing focuses on how AI influences application logic and how user-supplied input propagates through the system.


Pentest Methodology Per Layer

Infrastructure – Compute, Storage, Networking

Method: Full traditional penetration test Objective: RCE, privilege escalation, lateral movement Techniques: container breakout, exposed APIs, misconfigured IAM, network traversal

Model – Foundation Models & Inference Servers

Method: Model exploitation and behavior manipulation Objective: bypass controls, compromise inference server, corrupt model behavior Techniques: jailbreaks, prompt injection, unsafe model loaders, fine-tune poisoning

Data – RAG & Embeddings

Method: Retrieval poisoning and data manipulation Objective: influence output, leak internal knowledge Techniques: vector injection, embedding manipulation, malicious ingestion

Orchestration – Agents & Tools

Method: Agent and tool-control exploitation Objective: trigger unauthorized action via LLM Techniques: forced tool-calls, agent memory poisoning, workflow bypass

Application – UI, API, Integrations

Method: Hybrid web + AI logic testing Objective: misuse application functionality Techniques: output injection, AI-driven bypass, structured input exploitation


Attacker Priorities: Realistic Ranking

Most impactful to least:

  1. RCE – complete system compromise

  2. Persistence – long-term foothold

  3. Training Data Poisoning – lasting behavioral impact

  4. Orchestration Manipulation – turning the AI into an attack tool

  5. Output Manipulation – direct effect on users

  6. Sensitive Data Extraction – embeddings, documents, memory

  7. Model Weight Theft – only critical when weights are core IP

Model theft is overvalued; control and persistence matter more in real attacks.


Traditional vs AI-Specific Attack Surfaces

Infrastructure: traditional security testing

Models, Data, Orchestration: AI-native attack surfaces

Applications: hybrid of both domains

Security programs must treat each layer as a separate target environment.


Conclusion

The AI stack demands a combined testing approach: traditional penetration testing for the base infrastructure and specialized offensive testing for AI behavior, data integrity, and orchestration logic.

Infrastructure compromise grants attackers the capability to control the entire system, but the effects—model corruption, data poisoning, prompt manipulation, orchestration hijacking—require deliberate follow-on actions.

AI-specific attack surfaces must be validated with the same rigor traditionally applied to operating systems, containers, and networks.

[Original Source](No response)