AI DevSecOps and Incident Response

Standard DevSecOps assumes deterministic systems. AI systems break all three assumptions -- outputs are non-deterministic, vulnerabilities can be in the prompt or model, and rollback may mean reverting a prompt, a model version, or a guardrail configuration.

Posted Jan 20, 2026

10 min read

Standard DevSecOps assumes deterministic systems: the same code produces the same output, security vulnerabilities are in the code, and rollback means deploying the previous binary. AI systems break all three assumptions — outputs are non-deterministic, vulnerabilities can be in the prompt or model, and rollback may mean reverting a prompt, a model version, or a guardrail configuration.

What Changes for AI Systems

Traditional DevSecOps	AI DevSecOps
Deterministic outputs — same input = same output	Non-deterministic outputs — same input can produce different responses across calls
Vulnerabilities are in code — CVEs, dependency issues	Vulnerabilities are in prompts and models — prompt injection, jailbreaks, data extraction
Testing is binary — tests pass or fail	Testing is probabilistic — evals have pass rates, not pass/fail
Rollback = previous binary	Rollback = previous prompt + model + guardrail config + tool definitions
Security perimeter is network	Security perimeter includes the prompt — user input is part of the “code” the LLM executes
Secrets are in config/env	Secrets can leak in model outputs — PII, API keys, system prompts
Supply chain = dependencies	Supply chain = dependencies + model weights + training data + prompt templates

These differences mean you need AI-specific extensions to your existing DevSecOps practices, not a replacement.

CI/CD Eval Pipelines

Integrate eval gates into your existing CI/CD pipeline. This extends the standard build-test-deploy pipeline with AI-specific validation:

┌──────────────────────────────────────────────────────────────────┐
│  Pull Request                                                    │
│                                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │
│  │  Build   │─>│  Lint +  │─>│  Unit    │─>│  Eval Suite      │ │
│  │          │  │  Type    │  │  Tests   │  │  (golden dataset) │ │
│  └──────────┘  └──────────┘  └──────────┘  └────────┬─────────┘ │
│                                                      │           │
│                                              Gate: pass rate     │
│                                              > 93%, safety      │
│                                              = 100%, cost       │
│                                              < budget           │
└──────────────────────────────────────────────┼───────────────────┘
                                               │ pass
                                               v
┌──────────────────────────────────────────────────────────────────┐
│  Staging Deploy                                                  │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────────────────────────┐    │
│  │  Deploy to      │─>│  Canary Evals                       │    │
│  │  staging        │  │  (100 sampled production inputs)    │    │
│  └─────────────────┘  └──────────────────┬──────────────────┘    │
│                                          │                       │
│                                  Gate: no regression > 1%        │
│                                  vs current production           │
└──────────────────────────────────┼───────────────────────────────┘
                                   │ pass
                                   v
┌──────────────────────────────────────────────────────────────────┐
│  Production Deploy                                               │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────────────────────────┐    │
│  │  Canary (10%)   │─>│  Monitor for 30 min                │    │
│  │  then full      │  │  Quality, cost, guardrail metrics   │    │
│  └─────────────────┘  └──────────────────┬──────────────────┘    │
│                                          │                       │
│                                  Auto-rollback if                │
│                                  quality < threshold             │
└──────────────────────────────────────────────────────────────────┘
                                   │
                                   v
┌──────────────────────────────────────────────────────────────────┐
│  Post-Deploy (Scheduled)                                         │
│                                                                  │
│  Daily: Run eval suite on 200 production samples                 │
│  Weekly: Full drift analysis (input, output, semantic)           │
│  Monthly: Red-team evaluation (adversarial inputs)               │
└──────────────────────────────────────────────────────────────────┘

PR-Time Eval Gates

What to run on every PR that changes prompts, tools, or agent logic:

Check	Pass Criteria	Runtime
Golden eval suite (200-500 cases)	Pass rate > 93%	2-5 min
Safety eval suite (100+ adversarial cases)	Pass rate = 100%	1-2 min
Cost benchmark (tokens per task)	No > 20% increase vs baseline	1-2 min
LLM-as-judge quality (50 cases)	Average score > 4.0/5.0	2-3 min

Total CI time for AI evals: 5-12 minutes, run in parallel with standard tests.

What Triggers a Full Eval Run

Not every code change needs eval validation. Run evals when:

System prompt or tool definitions change
Agent orchestration logic changes
Model version changes (including provider-side updates)
Guardrail configuration changes
RAG knowledge base updates

Standard code changes (API routes, infrastructure, non-agent logic) go through normal CI without eval gates.

Security Monitoring for AI

Prompt Injection Detection

Prompt injection is the SQL injection of AI systems. Monitor for it in real-time:

Detection layers:

Layer	Detection Method	Response
Input guardrail	Pattern matching for known injection templates (“ignore previous instructions”, “you are now…”)	Block or flag
Semantic analysis	Classifier trained on injection examples vs legitimate queries	Score and threshold
Output monitoring	Detect when the agent reveals system prompts, ignores role boundaries, or performs unauthorized actions	Block response, alert
Behavioral anomaly	Agent suddenly accesses tools it rarely uses, or generates responses that are statistically unusual	Alert for investigation

Metrics to track:

Injection attempt rate (detected by input guardrails)
Injection bypass rate (detected by output monitoring — this is the one that matters)
False positive rate (legitimate queries blocked)

Data Leak Prevention

AI agents can leak sensitive data through their responses:

Leak Vector	Detection	Prevention
PII in outputs	Regex + NER on every response (SSN, credit card, email, phone)	Output guardrail: redact or block
System prompt exposure	Monitor for outputs matching system prompt fragments	Output guardrail: block
Training data extraction	Detect verbatim repetition of known sensitive training content	Output guardrail: block
Cross-session leakage	Agent reveals information from one user’s session to another	Session isolation architecture
Tool result leakage	Agent includes raw database results or API responses in output	Output filtering, structured response enforcement

Unauthorized Tool Access

Monitor for agents calling tools outside their authorized scope:

        
      
tool_access_policy = {
    "support-agent": {
        "allowed": ["search_kb", "lookup_customer", "create_ticket"],
        "denied": ["delete_account", "modify_payment", "export_data"],
        "requires_approval": ["refund_payment", "escalate_to_human"]
    }
}

# Alert if an agent attempts to call a denied tool
# Log every requires_approval tool call for audit

Compliance and Audit Trails

Trace ID Linkage

Every agent interaction should have a complete audit chain:

User Request → Trace ID: abc123
├── Authentication: user_id: u_456, role: customer
├── Input Guardrail: checked, result: pass
├── Agent Execution: trace in Cloud Trace
│   ├── LLM Call 1: model: gemini-2.0-flash, tokens: 1200
│   ├── Tool Call: search_kb, result: 3 documents
│   ├── LLM Call 2: model: gemini-2.0-flash, tokens: 2100
│   └── Guardrail: output filter, result: pass
├── Output Guardrail: checked, result: pass (PII: none detected)
├── Response delivered to user
└── Audit Log: Cloud Audit Logs entry with trace ID

Trace ID links everything: Cloud Trace spans, Cloud Logging entries, Cloud Audit Logs, Langfuse session, and any external tool API calls.

Data Retention Requirements

Data Type	Typical Retention	Notes
Agent traces (detailed)	30-90 days	Sampling after 30 days to reduce storage cost
Guardrail trigger logs	1-7 years	Compliance requirement (varies by regulation)
Audit logs (who/when/what)	1-7 years	Cloud Audit Logs: 400 days admin, 30 days data access (configurable)
Eval results	Indefinitely	Small volume, critical for regression tracking
Conversation content	Per privacy policy	GDPR: delete on user request. Anonymize for eval datasets.

AI-Specific Incident Runbooks

Runbook 1: Model Quality Degradation

Trigger: Task completion rate drops >5% over 1-hour rolling window, or eval regression detected in weekly production eval run.

Step	Action
1. Verify	Check if `gen_ai.response.model` changed (provider-side update). Check if traffic pattern shifted (input drift).
2. Scope	Is it one agent or all agents? One model or all models? One task type or all?
3. Investigate	Pull sample traces from the degradation period. Run LLM-as-judge on 50 cases. Compare to baseline.
4. Mitigate	If model change: pin to previous model version. If prompt issue: revert prompt. If input drift: acknowledge new traffic pattern and triage.
5. Resolve	Add failing cases to eval suite. Deploy fix through normal CI/CD with eval gates.
6. Prevent	Set up model version monitoring alert. Add canary eval for the failing pattern.

Runbook 2: Cost Anomaly

Trigger: Daily spend exceeds 150% of 7-day rolling average, or individual task cost exceeds 5x the median for its type.

Step	Action
1. Verify	Check Cloud Billing for actual spend increase (not a reporting lag). Identify which agent/model is responsible.
2. Scope	Is it a single agent in a loop, or a broad traffic increase?
3. Investigate	Pull the most expensive traces. Look for: agent loops (>5 LLM calls per task), context window bloat (growing input tokens), model routing failure (expensive model used where cheap one should be).
4. Mitigate	If loop: add loop detection guardrail (max iterations). If context bloat: truncate conversation history. If routing: fix model router. Emergency: throttle non-critical agents.
5. Resolve	Deploy fix. Monitor cost for 24 hours.
6. Prevent	Add cost-per-task alert at 3x median. Add loop detection to agent framework.

Runbook 3: Prompt Injection Attack

Trigger: Input guardrail injection detection rate spikes >3x baseline, or output monitoring detects system prompt leak or role boundary violation.

Step	Action
1. Verify	Confirm this is an attack, not a false positive spike. Check injection patterns in logs.
2. Scope	Is it a single user or a coordinated attack? Which agents are targeted?
3. Investigate	Review the injection payloads. Did any bypass input guardrails? Did any cause harmful outputs?
4. Mitigate	If bypassed: add the new injection pattern to guardrails immediately. If harmful output: block the user/IP if malicious. If system prompt leaked: rotate any secrets referenced in the system prompt.
5. Resolve	Update guardrail patterns. Add bypass cases to safety eval suite.
6. Prevent	Run red-team eval suite monthly. Consider adding a dedicated injection classifier.

Runbook 4: Data Leak Detected

Trigger: Output guardrail detects PII in agent response, or user reports receiving another user’s data.

Step	Action
1. Verify	Confirm the leak. Pull the full trace and conversation. Identify what data was exposed.
2. Scope	Is it a one-time occurrence or a systematic issue? How many users affected?
3. Investigate	Trace the data source: did it come from the model, from a tool result, or from conversation history? Check session isolation.
4. Mitigate	If tool result: add output filtering on the tool. If session leakage: fix session isolation bug. If model-side: add PII detection guardrail if not present. Notify affected users per privacy policy.
5. Resolve	Deploy fix. Run PII detection eval across recent production conversations.
6. Prevent	Add PII detection to output guardrails (if not present). Add PII leak scenarios to safety eval suite. Review data access patterns for all agent tools.

Rollback Strategies for AI

AI systems have multiple independently deployable components. Rolling back is not just “deploy the previous container”:

Component	Rollback Mechanism	Speed	Risk
Prompt/system prompt	Version-controlled in Langfuse or git. Revert to previous version.	Seconds (if prompt served dynamically)	Low — prompts are text
Model version	Pin `gen_ai.request.model` to specific version (e.g., `gemini-2.0-flash-001`).	Seconds (config change)	Low — provider still serves the old version (usually)
Guardrail config	Version-controlled. Revert config and redeploy guardrail service.	Minutes	Medium — may re-expose issues the guardrail was catching
Agent code	Standard container rollback via Cloud Run revision or Kubernetes rollback.	Minutes	Medium — standard deployment risk
Tool definitions	Version-controlled alongside agent code. Rollback with agent code.	Minutes	Medium — tool changes may have data implications
Knowledge base (RAG)	Re-index from previous data snapshot.	Hours	High — re-indexing is slow

Prompt + Model Rollback (Independent of Code)

The highest-impact rollback is often the fastest: reverting a prompt or model pin without any code deployment. This requires:

Dynamic prompt serving — prompts loaded from Langfuse or a config service, not hardcoded
Model version pinning — agent config specifies exact model version, not just model family
Feature flags for guardrails — toggle guardrail rules without redeploy

With these in place, you can revert the most common AI regressions (prompt change, model update, guardrail misconfiguration) in seconds.

References

AI & Agents, AI Ops

guardrails

This post is licensed under CC BY 4.0 by the author.