CISO Coverage Map: The Working CISO's Guide to Securing AI Runtime

GUIDE

Delphi Security

14 min read

Enterprise AI security playbooks consistently identify the same hard layers: runtime detection, telemetry, audit, and adversarial testing. These are where most programs fail.

Maps standard CISO requirements for AI runtime security to the capabilities Delphi delivers today. Logging, detection, MCP gateway, SOC integration.

CISO Coverage Map: The Working CISO's Guide to Securing AI Runtime

Delphi Security · April 15, 2026

Enterprise AI security playbooks consistently identify the same hard layers — runtime detection, telemetry, audit, and adversarial testing. These are where most programs fail.

This guide maps the standard CISO requirements for AI runtime security to the capabilities Delphi delivers today. Capabilities currently in build are called out separately. Items outside Delphi's scope are listed honestly at the end.

One deployment. The detection, telemetry, audit, and adversarial testing layers — addressed directly. The hardest part of the AI security checklist, covered by a single tool.

Requirement → Capability

Each requirement below maps to a specific Delphi capability shipped today.

Logging and Audit

Provenance logging across all AI providers — model, version, prompt, response, verdict, timestamp, reviewer.
How Delphi delivers: Every request through Delphi's proxy is logged across OpenAI, Anthropic, Gemini, and Bedrock with full request context, the rule that fired, and the detection verdict. One log surface across all vendors.

Audit trail with guardrail evaluation transparency — vendor must expose raw telemetry, not just a score.
How Delphi delivers: Detection cascade outputs the layer, rule ID, and reasoning for every decision. Raw telemetry exposed via API. No black-box scoring.

Centralized log retention against the longest applicable regulatory clock, accessible to investigators within an hour.
How Delphi delivers: Centralized telemetry store with multi-tenant isolation. Investigation surface queries per agent, per session, per rule, in seconds.

Detection and Enforcement

Application-layer input and output sanitization — inspecting prompts and responses before they affect downstream systems.
How Delphi delivers: Multi-layer detection cascade: pattern matching, heuristic analysis, ML classification, and behavioral arbiter. Drop-in proxy replacement; no model trust required.

Bidirectional DLP — sensitive data shouldn't leak in either direction.
How Delphi delivers: 15 classifiers running against prompts and outputs. Catches PII, credentials, API keys, and customer records with graduated actions (warn, redact, block).

Documented, drillable kill switches — per-agent and per-deployment, executable within minutes.
How Delphi delivers: Agent quarantine, registry-level kill flags, and proxy deny-all paths, all triggerable from the dashboard. Runbook documented.

Centrally managed MCP gateway — every MCP server, tool, and integration vetted at runtime.
How Delphi delivers: Runtime MCP scanning gateway with tool poisoning detection, typosquat detection, argument validation, and rug-pull detection. Inline rule enforcement.

Adversarial Testing

Scheduled adversarial testing across prompt injection, retrieval poisoning, jailbreak, context overflow, and output filter bypass.
How Delphi delivers: Context-aware vulnerability scanning, plus OWASP Top 10 LLM and Agentic vulnerability scanners. Promptfoo and Garak test suites available.

Detection accuracy and drift monitoring with defined metrics, thresholds, and alerts.
How Delphi delivers: Continuous benchmark runs surface drift in detection accuracy over time. Per-customer detection rates trended in the dashboard.

Inventory and Visibility

Solution registry — every AI model, service, and provider in use, with named owners and risk classification.
How Delphi delivers: Agent registry inventories every agent, deployment, and provider running through Delphi, with ownership metadata and trust scoring. Proxy traffic surfaces every model and version actually being called in production.

Vetted tooling registry — every MCP server, plug-in, and integration the agent uses.
How Delphi delivers: Tools Declaration view surfaces every tool each agent has invoked, with first-seen, last-used, call counts, and blocked attempts per agent.

Output-side validation — AI as a function call, not as a system actor; outputs constrained and inspected.
How Delphi delivers: Output-side detection inspects what the model is trying to make the application do, with anomaly flagging when output deviates from expected shape or intent.

SOC Integration

AI-specific SIEM rules — purpose-built for AI telemetry, not retrofitted from network rules.
How Delphi delivers: 17-category detection taxonomy (prompt injection, jailbreak, A2A attack chains, agent identity anomalies, MCP tool drift, and others) ships as a starter pack of SIEM rules for Splunk, Datadog, and Sentinel.

AI scenarios in SOC runbooks — model compromise, training data exfiltration, vendor breach, prompt injection that reached production.
How Delphi delivers: Detection telemetry feeds these scenarios directly into the SOC. Rule attribution lets analysts trace from alert to root cause without vendor escalation.

DeepMind Agent Traps — Runtime Coverage

In March 2026, Google DeepMind published AI Agent Traps — a taxonomy of six attack categories that target an agent's environment rather than its model. The paper's argument is direct: training-time defenses cannot solve an inference-time problem, and the strongest available line of defense is holistic adversarial testing of the runtime environment.

That is the thesis underneath xAIDR, Delphi's execution-layer detection architecture. xAIDR sits inside the agent's runtime — through in-process sensors and framework callbacks — not as a network proxy between the agent and the model. It inspects what the agent is about to do at the point of execution, with the full structural context of the agent's reasoning, tool calls, memory operations, and inter-agent messages.

The attack surface has moved from the model to the environment. xAIDR detects there — at the point of execution, inside the agent itself.

01. Content Injection · Perception — Adversarial content injected through invisible CSS, hidden HTML, image steganography, metadata, or accessibility tags. The agent parses what the human never sees. Up to 86% documented success.
Coverage: Text normalization and delta analysis catch homoglyphs, zero-width characters, and Unicode obfuscation at the prompt boundary. Steganographic input detection and encoding-instruction modules flag payloads hidden through transformation. Three of fourteen behavioral modules target this trap class directly.

02. Semantic Manipulation · Reasoning — Biased phrasing, emotional framing, authority impersonation, cognitive bias exploitation. No overt commands; the manipulation lives in the framing.
Coverage: Intent decomposition surfaces mismatches between an agent's stated reasoning and the action it is being driven toward. Fiction-frame detection identifies narrative manipulation that masks real intent. Conversational trajectory analysis tracks reasoning drift across multi-turn sessions.

03. Cognitive State · Memory — Contamination of RAG corpora, vector stores, and long-term memory. Documented 80%+ attack success with under 0.1% of the corpus contaminated.
Coverage: RAG Shield performs per-document inspection at retrieval time and quarantines poisoned content before it enters the agent's context. Memory-poisoning patterns are scored against the same detection cascade as inbound prompts. Outputs sourced from suspect retrieval paths are flagged for review.

04. Behavioral Control · Action — Indirect prompt injection through retrieved content, emails, documents, and attacker-controlled sub-agents. The Microsoft M365 Copilot case documents 10/10 exfiltration success in this category.
Coverage: Indirect prompt injection through retrieved content is the primary attack class targeted by 230+ pattern rules and the multi-layer detection cascade. A2A attack chains and orchestrator-injected sub-agent flows are detected by behavioral correlation across the agent fleet — validated against a published 500-prompt A2A benchmark at 94.5% accuracy and 98.4% precision.

05. Systemic · Coordination — Payloads distributed across multiple sources or interactions. Each fragment benign in isolation; the malicious behavior emerges only when the agent aggregates them.
Coverage: Multi-turn attack-chain detection correlates inputs across conversation turns to surface fragmented payloads aligning into a coordinated campaign. Cross-agent behavioral correlation extends this analysis across the agent fleet. Multi-source single-turn aggregation is in active development.

06. Human-in-the-Loop · Supervisor — The agent's output becomes the attack vector. Technically clean output that, when read by a human, induces harmful action — including documented cases of summarization tools repeating ransomware commands as legitimate fix instructions.
Coverage: Bidirectional DLP scans outputs for sensitive data leakage. Output-side detection inspects what the agent is trying to make downstream systems do, with anomaly flagging when output deviates from expected shape. Self-referential exfiltration patterns are flagged. Detection of socially engineered output text is in active development.

Source: Franklin, M., Tomašev, N., Jacobs, J., Leibo, J. Z., & Osindero, S. (2026). AI Agent Traps. Google DeepMind, SSRN preprint, March 2026.

In Build

Capabilities currently in active development with defined customer outcomes.

  • Tool and URL allowlisting — Every tool an agent invokes is either declared and approved, or flagged before it executes.

  • Declared tool surface validation — Tool drift detection at the fleet level. An agent that quietly starts using a tool it never declared is surfaced immediately.

  • Microsoft AGT integration — One agent, two enforcement layers, one operational view. AGT enforces policy locally; Delphi runs content detection and fleet correlation on the same telemetry stream.

  • Multi-source aggregation detection — Closes the gap on Systemic Traps. Catches payloads that are individually benign but combine into a coordinated campaign.

  • Socially engineered output detection — Closes the gap on Human-in-the-Loop Traps. Detects content like summarization output that repeats embedded ransomware commands as legitimate fix instructions.

  • HTML source-vs-parsed comparison — Strengthens coverage of Content Injection Traps at the source rather than the prompt boundary.

Where Delphi Doesn't Help

Honest scoping. These layers belong elsewhere in the security stack:

  • Organizational governance, policy definition, and risk classification workflows — your GRC tooling.

  • IAM, SSO, BYOK, and infrastructure security — your identity and platform layer.

  • Network-level DLP, secure web gateway domain blocking, and browser controls — your existing network stack.

  • Compliance workflow tooling and audit process management — your GRC and audit platforms.

  • Standing red team, annual engagements, and tabletop exercises — services and people investment.

  • Cultural adoption, talent strategy, training matrix, and AI ethics committees — your CISO program.

Delphi focuses on AI runtime security: detection, enforcement, telemetry, and adversarial testing — the layers most enterprises struggle to operationalize. One deployment removes the burden.

Map Your CISO Requirements to Delphi

See how Delphi covers the runtime AI security checklist in a single deployment — or run our free vulnerability scanner against your stack today.