Post

AI DevSecOps and Incident Response

Standard DevSecOps assumes deterministic systems. AI systems break all three assumptions -- outputs are non-deterministic, vulnerabilities can be in the prompt or model, and rollback may mean reverting a prompt, a model version, or a guardrail configuration.

AI DevSecOps and Incident Response

Standard DevSecOps assumes deterministic systems: the same code produces the same output, security vulnerabilities are in the code, and rollback means deploying the previous binary. AI systems break all three assumptions — outputs are non-deterministic, vulnerabilities can be in the prompt or model, and rollback may mean reverting a prompt, a model version, or a guardrail configuration.


What Changes for AI Systems

Traditional DevSecOps AI DevSecOps
Deterministic outputs — same input = same output Non-deterministic outputs — same input can produce different responses across calls
Vulnerabilities are in code — CVEs, dependency issues Vulnerabilities are in prompts and models — prompt injection, jailbreaks, data extraction
Testing is binary — tests pass or fail Testing is probabilistic — evals have pass rates, not pass/fail
Rollback = previous binary Rollback = previous prompt + model + guardrail config + tool definitions
Security perimeter is network Security perimeter includes the prompt — user input is part of the “code” the LLM executes
Secrets are in config/env Secrets can leak in model outputs — PII, API keys, system prompts
Supply chain = dependencies Supply chain = dependencies + model weights + training data + prompt templates

These differences mean you need AI-specific extensions to your existing DevSecOps practices, not a replacement.


CI/CD Eval Pipelines

Integrate eval gates into your existing CI/CD pipeline. This extends the standard build-test-deploy pipeline with AI-specific validation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
┌──────────────────────────────────────────────────────────────────┐
│  Pull Request                                                    │
│                                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │
│  │  Build   │─>│  Lint +  │─>│  Unit    │─>│  Eval Suite      │ │
│  │          │  │  Type    │  │  Tests   │  │  (golden dataset) │ │
│  └──────────┘  └──────────┘  └──────────┘  └────────┬─────────┘ │
│                                                      │           │
│                                              Gate: pass rate     │
│                                              > 93%, safety      │
│                                              = 100%, cost       │
│                                              < budget           │
└──────────────────────────────────────────────┼───────────────────┘
                                               │ pass
                                               v
┌──────────────────────────────────────────────────────────────────┐
│  Staging Deploy                                                  │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────────────────────────┐    │
│  │  Deploy to      │─>│  Canary Evals                       │    │
│  │  staging        │  │  (100 sampled production inputs)    │    │
│  └─────────────────┘  └──────────────────┬──────────────────┘    │
│                                          │                       │
│                                  Gate: no regression > 1%        │
│                                  vs current production           │
└──────────────────────────────────┼───────────────────────────────┘
                                   │ pass
                                   v
┌──────────────────────────────────────────────────────────────────┐
│  Production Deploy                                               │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────────────────────────┐    │
│  │  Canary (10%)   │─>│  Monitor for 30 min                │    │
│  │  then full      │  │  Quality, cost, guardrail metrics   │    │
│  └─────────────────┘  └──────────────────┬──────────────────┘    │
│                                          │                       │
│                                  Auto-rollback if                │
│                                  quality < threshold             │
└──────────────────────────────────────────────────────────────────┘
                                   │
                                   v
┌──────────────────────────────────────────────────────────────────┐
│  Post-Deploy (Scheduled)                                         │
│                                                                  │
│  Daily: Run eval suite on 200 production samples                 │
│  Weekly: Full drift analysis (input, output, semantic)           │
│  Monthly: Red-team evaluation (adversarial inputs)               │
└──────────────────────────────────────────────────────────────────┘

PR-Time Eval Gates

What to run on every PR that changes prompts, tools, or agent logic:

Check Pass Criteria Runtime
Golden eval suite (200-500 cases) Pass rate > 93% 2-5 min
Safety eval suite (100+ adversarial cases) Pass rate = 100% 1-2 min
Cost benchmark (tokens per task) No > 20% increase vs baseline 1-2 min
LLM-as-judge quality (50 cases) Average score > 4.0/5.0 2-3 min

Total CI time for AI evals: 5-12 minutes, run in parallel with standard tests.

What Triggers a Full Eval Run

Not every code change needs eval validation. Run evals when:

  • System prompt or tool definitions change
  • Agent orchestration logic changes
  • Model version changes (including provider-side updates)
  • Guardrail configuration changes
  • RAG knowledge base updates

Standard code changes (API routes, infrastructure, non-agent logic) go through normal CI without eval gates.


Security Monitoring for AI

Prompt Injection Detection

Prompt injection is the SQL injection of AI systems. Monitor for it in real-time:

Detection layers:

Layer Detection Method Response
Input guardrail Pattern matching for known injection templates (“ignore previous instructions”, “you are now…”) Block or flag
Semantic analysis Classifier trained on injection examples vs legitimate queries Score and threshold
Output monitoring Detect when the agent reveals system prompts, ignores role boundaries, or performs unauthorized actions Block response, alert
Behavioral anomaly Agent suddenly accesses tools it rarely uses, or generates responses that are statistically unusual Alert for investigation

Metrics to track:

  • Injection attempt rate (detected by input guardrails)
  • Injection bypass rate (detected by output monitoring — this is the one that matters)
  • False positive rate (legitimate queries blocked)

Data Leak Prevention

AI agents can leak sensitive data through their responses:

Leak Vector Detection Prevention
PII in outputs Regex + NER on every response (SSN, credit card, email, phone) Output guardrail: redact or block
System prompt exposure Monitor for outputs matching system prompt fragments Output guardrail: block
Training data extraction Detect verbatim repetition of known sensitive training content Output guardrail: block
Cross-session leakage Agent reveals information from one user’s session to another Session isolation architecture
Tool result leakage Agent includes raw database results or API responses in output Output filtering, structured response enforcement

Unauthorized Tool Access

Monitor for agents calling tools outside their authorized scope:

1
2
3
4
5
6
7
8
9
10
tool_access_policy = {
    "support-agent": {
        "allowed": ["search_kb", "lookup_customer", "create_ticket"],
        "denied": ["delete_account", "modify_payment", "export_data"],
        "requires_approval": ["refund_payment", "escalate_to_human"]
    }
}

# Alert if an agent attempts to call a denied tool
# Log every requires_approval tool call for audit

Compliance and Audit Trails

Trace ID Linkage

Every agent interaction should have a complete audit chain:

1
2
3
4
5
6
7
8
9
10
11
User Request → Trace ID: abc123
├── Authentication: user_id: u_456, role: customer
├── Input Guardrail: checked, result: pass
├── Agent Execution: trace in Cloud Trace
│   ├── LLM Call 1: model: gemini-2.0-flash, tokens: 1200
│   ├── Tool Call: search_kb, result: 3 documents
│   ├── LLM Call 2: model: gemini-2.0-flash, tokens: 2100
│   └── Guardrail: output filter, result: pass
├── Output Guardrail: checked, result: pass (PII: none detected)
├── Response delivered to user
└── Audit Log: Cloud Audit Logs entry with trace ID

Trace ID links everything: Cloud Trace spans, Cloud Logging entries, Cloud Audit Logs, Langfuse session, and any external tool API calls.

Data Retention Requirements

Data Type Typical Retention Notes
Agent traces (detailed) 30-90 days Sampling after 30 days to reduce storage cost
Guardrail trigger logs 1-7 years Compliance requirement (varies by regulation)
Audit logs (who/when/what) 1-7 years Cloud Audit Logs: 400 days admin, 30 days data access (configurable)
Eval results Indefinitely Small volume, critical for regression tracking
Conversation content Per privacy policy GDPR: delete on user request. Anonymize for eval datasets.

AI-Specific Incident Runbooks

Runbook 1: Model Quality Degradation

Trigger: Task completion rate drops >5% over 1-hour rolling window, or eval regression detected in weekly production eval run.

Step Action
1. Verify Check if gen_ai.response.model changed (provider-side update). Check if traffic pattern shifted (input drift).
2. Scope Is it one agent or all agents? One model or all models? One task type or all?
3. Investigate Pull sample traces from the degradation period. Run LLM-as-judge on 50 cases. Compare to baseline.
4. Mitigate If model change: pin to previous model version. If prompt issue: revert prompt. If input drift: acknowledge new traffic pattern and triage.
5. Resolve Add failing cases to eval suite. Deploy fix through normal CI/CD with eval gates.
6. Prevent Set up model version monitoring alert. Add canary eval for the failing pattern.

Runbook 2: Cost Anomaly

Trigger: Daily spend exceeds 150% of 7-day rolling average, or individual task cost exceeds 5x the median for its type.

Step Action
1. Verify Check Cloud Billing for actual spend increase (not a reporting lag). Identify which agent/model is responsible.
2. Scope Is it a single agent in a loop, or a broad traffic increase?
3. Investigate Pull the most expensive traces. Look for: agent loops (>5 LLM calls per task), context window bloat (growing input tokens), model routing failure (expensive model used where cheap one should be).
4. Mitigate If loop: add loop detection guardrail (max iterations). If context bloat: truncate conversation history. If routing: fix model router. Emergency: throttle non-critical agents.
5. Resolve Deploy fix. Monitor cost for 24 hours.
6. Prevent Add cost-per-task alert at 3x median. Add loop detection to agent framework.

Runbook 3: Prompt Injection Attack

Trigger: Input guardrail injection detection rate spikes >3x baseline, or output monitoring detects system prompt leak or role boundary violation.

Step Action
1. Verify Confirm this is an attack, not a false positive spike. Check injection patterns in logs.
2. Scope Is it a single user or a coordinated attack? Which agents are targeted?
3. Investigate Review the injection payloads. Did any bypass input guardrails? Did any cause harmful outputs?
4. Mitigate If bypassed: add the new injection pattern to guardrails immediately. If harmful output: block the user/IP if malicious. If system prompt leaked: rotate any secrets referenced in the system prompt.
5. Resolve Update guardrail patterns. Add bypass cases to safety eval suite.
6. Prevent Run red-team eval suite monthly. Consider adding a dedicated injection classifier.

Runbook 4: Data Leak Detected

Trigger: Output guardrail detects PII in agent response, or user reports receiving another user’s data.

Step Action
1. Verify Confirm the leak. Pull the full trace and conversation. Identify what data was exposed.
2. Scope Is it a one-time occurrence or a systematic issue? How many users affected?
3. Investigate Trace the data source: did it come from the model, from a tool result, or from conversation history? Check session isolation.
4. Mitigate If tool result: add output filtering on the tool. If session leakage: fix session isolation bug. If model-side: add PII detection guardrail if not present. Notify affected users per privacy policy.
5. Resolve Deploy fix. Run PII detection eval across recent production conversations.
6. Prevent Add PII detection to output guardrails (if not present). Add PII leak scenarios to safety eval suite. Review data access patterns for all agent tools.

Rollback Strategies for AI

AI systems have multiple independently deployable components. Rolling back is not just “deploy the previous container”:

Component Rollback Mechanism Speed Risk
Prompt/system prompt Version-controlled in Langfuse or git. Revert to previous version. Seconds (if prompt served dynamically) Low — prompts are text
Model version Pin gen_ai.request.model to specific version (e.g., gemini-2.0-flash-001). Seconds (config change) Low — provider still serves the old version (usually)
Guardrail config Version-controlled. Revert config and redeploy guardrail service. Minutes Medium — may re-expose issues the guardrail was catching
Agent code Standard container rollback via Cloud Run revision or Kubernetes rollback. Minutes Medium — standard deployment risk
Tool definitions Version-controlled alongside agent code. Rollback with agent code. Minutes Medium — tool changes may have data implications
Knowledge base (RAG) Re-index from previous data snapshot. Hours High — re-indexing is slow

Prompt + Model Rollback (Independent of Code)

The highest-impact rollback is often the fastest: reverting a prompt or model pin without any code deployment. This requires:

  1. Dynamic prompt serving — prompts loaded from Langfuse or a config service, not hardcoded
  2. Model version pinning — agent config specifies exact model version, not just model family
  3. Feature flags for guardrails — toggle guardrail rules without redeploy

With these in place, you can revert the most common AI regressions (prompt change, model update, guardrail misconfiguration) in seconds.


References

This post is licensed under CC BY 4.0 by the author.