When Your AI Agent Goes Wrong

Sometimes an AI agent misbehaves — it gets prompt-injected by a bad web page, runs the wrong command, leaks a credential, or quietly does the opposite of what you asked. TRACE detects it, freezes the evidence, and reconstructs what happened — in plain JSON anyone can read.

When agents go off-script, you need answers

The Problem

// 01 — The Problem

AI agents like Claude Code, Codex, and Gemini do a lot — they read your files, run commands, call APIs, even spawn other agents. Most of the time that's fine. But what happens when one of them goes off-script?

Your agent reads a web page that contains a hidden prompt and starts following the page's instructions instead of yours. You only notice when something looks off in the diff.

An MCP server you trusted got compromised and starts feeding your agent malicious data. The agent does what the server told it — and that's now in your codebase.

A sub-agent quietly reads your .env file and posts it to a paste service. By the time you find out, the credentials have been used.

Without a standard for AI incident response, you're left piecing things together from shell history and memory. TRACE is the missing layer that turns continuous audit data into something a responder, an insurer, or a regulator can actually use.

Detect, preserve, reconstruct, report

How It Works

// 02 — How It Works

TRACE runs on top of your VIBES audit data. When something goes wrong, four things happen automatically:

1

Detect

TRACE has a catalog of "indicators" — the patterns that show up when an agent has been compromised. It scans your audit data continuously and flags matches with severity and confidence.

2

Preserve

The moment an incident is opened, TRACE freezes a snapshot of the audit data, tool logs, and file changes — sealed with a tamper-evident signature so nothing can be quietly edited later.

3

Reconstruct

TRACE replays the session timeline, walks the chain of sub-agents, and lists every file, credential, and external service touched during the incident window. You get the full picture without piecing it together by hand.

4

Report

You get a structured JSON incident report compatible with security tools your team probably already uses (STIX). Share it with your security team, your insurer, or a regulator — no narrative document required.

Eight indicator classes tuned for agentic systems

What TRACE Catches

// 03 — What TRACE Catches

TRACE comes with a starter catalog of indicators specifically tuned for agentic systems. These aren't generic security alerts — they're patterns that only make sense when you understand how AI agents actually work.

Prompt injection

Hidden instructions in a web page, file, or tool result that try to override what you asked the agent to do.

Tool abuse

Bash commands, file writes, or network calls the agent shouldn't be making for the task it was given.

Sub-agent anomalies

Sub-agents spawned without a clear parent reason, or sub-agents reading data their parent never had access to.

Reasoning drift

The agent's chain-of-thought references things never in context, or shifts topic in ways the prompt doesn't justify.

Credential staging

Reads against secret files (.env, .aws/credentials, .ssh/) followed by network egress in the same session.

MCP server compromise

An MCP server you connected to is suddenly returning data that doesn't match its own published schema or signed responses.

Data exfiltration

The classic read-stage-egress pattern: the agent reads source code, encodes it, then sends it to an external service.

Off-pattern behavior

Sudden volume spikes, model switches mid-session, or geographic shifts that don't match the user's normal pattern.

A match is a triage signal, not an automatic alarm — TRACE annotates it with confidence and recommended action so you can decide what to investigate.

From "no visibility" to a sealed evidence package

Why This Matters

// 04 — Why This Matters

TRACE turns "we have no visibility" into "we have a sealed evidence package and a structured report." Four things change for your team.

You can answer "what did the agent actually do?"

When something looks wrong in a diff or a deploy, TRACE's timeline tells you every prompt, tool call, file change, and sub-agent spawn in chronological order — without you stitching it together from memory.

The evidence holds up

Sealed evidence bundles are tamper-evident from the moment an incident is opened. If you ever need to share it with a security team, an insurer, or a regulator, the chain of custody is already in place.

You can scope the blast radius fast

One command tells you every file the affected session touched, every credential it could read, every external service it called, and every sub-agent it spawned. Containment becomes a checklist instead of a guessing game.

Your tools already speak this format

Reports are plain JSON aligned with STIX 2.1 — the format most security tools already understand. No custom parser, no proprietary integration. Pipe it into whatever you already run.

Three ways to get ready

Be Ready Before It Happens

// 05 — Be Ready Before It Happens

TRACE works best when it's already in place when an incident starts. The audit data has to exist before something goes wrong. Start with VIBES tracking; layer on TRACE when you're running agents in any context that matters.

Get Instrumented

Start with VIBES tracking — TRACE consumes the audit data automatically. Works with Claude Code, Gemini, Codex, and more.

Install for your tool →

Spread the Word

Ask your AI tool provider about VIBES, VERIFY, and TRACE support. Forensic visibility works best when every tool reports its actions.

How to ask →

Try Maestro

Full VIBES ecosystem — VIBES, VERIFY, PRISM, EVOLVE, and TRACE — built in and ready when an incident actually happens.

runmaestro.ai →

Want the IoC catalog, evidence schema, and CLI surface? Read the TRACE spec →