Post

GCP AI Observability Stack

How to set up AI observability using GCP-native services -- Vertex AI Agent Engine telemetry, ADK instrumentation, Cloud Trace, Cloud Monitoring, Cloud Logging, and Looker dashboards -- and how to integrate open-source tools alongside them.

GCP AI Observability Stack

MMS builds on GCP. This file covers how to set up AI observability using GCP-native services — Vertex AI Agent Engine telemetry, ADK instrumentation, Cloud Trace, Cloud Monitoring, Cloud Logging, and Looker dashboards — and how to integrate open-source tools (Langfuse, OpenLLMetry) alongside them.


GCP AI Observability Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌─────────────────────────────────────────────────────────────────┐
│  Agent Code (ADK / LangGraph / Custom)                          │
│                                                                 │
│  ┌─────────────────────┐  ┌─────────────────────────────────┐   │
│  │ OTel SDK            │  │ Langfuse SDK (optional)         │   │
│  │ GenAI conventions   │  │ Evals, prompt mgmt, cost        │   │
│  └─────────┬───────────┘  └──────────────┬──────────────────┘   │
└────────────┼─────────────────────────────┼──────────────────────┘
             │                             │
             v                             v
┌────────────────────────┐   ┌─────────────────────────────────┐
│ OTel Collector         │   │ Langfuse (self-hosted on GKE)   │
│ (Cloud Run / GKE)      │   │ ClickHouse + Redis + S3         │
└──────┬─────┬───────────┘   └─────────────────────────────────┘
       │     │
       │     └──────────────────────┐
       v                            v
┌──────────────┐  ┌──────────────────────────┐  ┌───────────────┐
│ Cloud Trace  │  │ Cloud Monitoring         │  │ Cloud Logging │
│ (traces)     │  │ (metrics + alerts)       │  │ (structured)  │
└──────┬───────┘  └──────────┬───────────────┘  └───────┬───────┘
       │                     │                          │
       └─────────┬───────────┘                          │
                 v                                      │
       ┌─────────────────────┐                          │
       │ Looker Dashboards   │<─────────────────────────┘
       │ (unified view)      │
       └─────────────────────┘

Two complementary paths:

  1. GCP-native (Cloud Trace + Monitoring + Logging) — zero-config for Vertex AI Agent Engine, auto-correlated across GCP services, integrated with IAM and audit logging
  2. OSS (Langfuse + OpenLLMetry) — richer AI-specific features (evals, prompt management, session replay), self-hosted for data sovereignty

Both paths use OTel as the common transport. You can run them in parallel — GCP-native for infrastructure correlation and Langfuse for AI-specific analysis.


Vertex AI Agent Engine Telemetry

When agents are deployed on Vertex AI Agent Engine, several observability features are built in:

What You Get Out of the Box

Feature Service Details
Distributed traces Cloud Trace Agent invocation as root span, LLM/tool calls as children. OTel-compatible.
Request metrics Cloud Monitoring QPS, token throughput, TTFT (time-to-first-token), error rate
Structured logs Cloud Logging Request/response logs with trace ID correlation
Model observability dashboard Vertex AI console Token usage, latency distributions, error breakdowns per model
Cost tracking Cloud Billing Per-model cost, but not per-agent or per-task (requires custom metrics)

Enabling Telemetry

Telemetry is enabled via environment variables on the Agent Engine runtime:

Environment Variable Value What It Enables
GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY true Traces and logs (without prompt/response content)
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT true Full prompt and response content in traces (opt-in, security-sensitive)
GOOGLE_CLOUD_PROJECT your-project-id GCP project for trace/metric export (auto-detected in Agent Engine)

Important: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true logs full prompts and responses. Do not enable this in production if traces contain PII or sensitive customer data unless you have appropriate data retention and access controls.


ADK Instrumentation

The Google Agent Development Kit has built-in OTel support for Cloud Trace:

Option 1: Via AdkApp Abstraction

1
2
3
4
5
6
from google.adk.app import AdkApp

app = AdkApp(
    agent=my_agent,
    enable_tracing=True  # Sends traces to Cloud Trace
)

Option 2: Via Telemetry Module Directly

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from google.adk.telemetry import google_cloud

# Get GCP-specific OTel exporters
exporters = google_cloud.get_gcp_exporters(
    enable_cloud_tracing=True
)

# Configure OTel SDK with these exporters
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
for exporter in exporters:
    provider.add_span_processor(BatchSpanProcessor(exporter))

Option 3: Via Environment Variables (Zero-Code)

For agents deployed on Vertex AI Agent Engine or Cloud Run:

1
2
export GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

No code changes required. The ADK runtime auto-configures OTel exporters to Cloud Trace.

LangChain/LangGraph on Vertex AI

If using LangChain or LangGraph within Vertex AI Agent Engine:

1
2
3
4
5
6
from langchain.callbacks.tracers.langchain import wait_for_all_tracers

# Enable tracing via LangSmith (if available)
# Or use OpenLLMetry for OTel-native tracing to Cloud Trace
from traceloop.sdk import Traceloop
Traceloop.init(exporter=cloud_trace_exporter)

Cloud Trace for AI Agents

How Agent Traces Appear

An ADK agent deployed on Vertex AI Agent Engine produces traces with this hierarchy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Trace: /agent/invoke
│
├── Span: agent_engine.invoke                    [root]
│   service.name: "agent-engine"
│   agent.name: "support-agent"
│
│   ├── Span: adk.agent.run                      [agent execution]
│   │   gen_ai.agent.name: "support-agent"
│   │
│   │   ├── Span: gen_ai.chat                    [LLM call 1]
│   │   │   gen_ai.system: "vertex_ai"
│   │   │   gen_ai.request.model: "gemini-2.0-flash"
│   │   │   gen_ai.usage.input_tokens: 950
│   │   │   gen_ai.usage.output_tokens: 180
│   │   │
│   │   ├── Span: tool.execute                    [tool call]
│   │   │   tool.name: "search_kb"
│   │   │   tool.status: "ok"
│   │   │
│   │   └── Span: gen_ai.chat                    [LLM call 2]
│   │       gen_ai.usage.input_tokens: 2100
│   │       gen_ai.usage.output_tokens: 420
│   │
│   └── Span: guardrail.check                    [output validation]
│       guardrail.name: "content_filter"
│       guardrail.result: "pass"

Cross-Service Trace Propagation

Cloud Trace automatically correlates spans from different GCP services in the same trace:

  • Cloud Run — agent service spans
  • Cloud Functions — tool execution spans (if tools are serverless)
  • Vertex AI — model inference spans
  • Cloud SQL / Firestore — data access spans (if using GCP client libraries with tracing enabled)

W3C traceparent header propagation handles this automatically when using GCP client libraries.


Cloud Monitoring Metrics and Alerts

Built-in Vertex AI Metrics

Metric Description Useful For
aiplatform.googleapis.com/prediction/online/response_count Total prediction requests Traffic volume
aiplatform.googleapis.com/prediction/online/response_latencies Response latency distribution P50/P95/P99 latency tracking
aiplatform.googleapis.com/prediction/online/error_count Failed predictions Error rate alerting
aiplatform.googleapis.com/publisher/online_serving/token_count Token consumption Usage and cost tracking
aiplatform.googleapis.com/publisher/online_serving/first_token_latencies Time-to-first-token Streaming performance

Custom Metrics for AI Agents

Built-in metrics cover model-level observability. For agent-level insights, create custom metrics:

1
2
3
4
5
6
7
8
9
10
from google.cloud import monitoring_v3
from google.api import metric_pb2

# Custom metric: cost per agent task
client = monitoring_v3.MetricServiceClient()
series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/agent/task_cost_usd"
series.metric.labels["agent_name"] = "support-agent"
series.metric.labels["task_type"] = "ticket_resolution"
# ... write data point

Recommended custom metrics:

Custom Metric Labels Purpose
agent/task_cost_usd agent_name, task_type Per-task cost attribution
agent/task_completion_rate agent_name, task_type Quality tracking
agent/guardrail_triggers agent_name, guardrail_name, result Safety monitoring
agent/tool_call_duration agent_name, tool_name, status Tool performance
agent/tokens_per_task agent_name, task_type Efficiency tracking

Alert Policies

Alert Condition Notification
TTFT > 5s (P95) first_token_latencies P95 > 5000ms for 5 min PagerDuty + Slack
Error rate > 5% error_count / response_count > 0.05 for 5 min PagerDuty
Token spike token_count > 3x daily average Slack (cost alert)
Guardrail surge guardrail_triggers > 2x hourly baseline Slack (quality alert)
Task completion drop task_completion_rate < 85% over 1-hour window Slack + weekly report

Cloud Logging Patterns

Structured Log Entries

ADK and Vertex AI Agent Engine emit structured JSON logs to Cloud Logging. Key fields:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "severity": "INFO",
  "message": "Agent task completed",
  "logging.googleapis.com/trace": "projects/my-project/traces/abc123",
  "logging.googleapis.com/spanId": "def456",
  "jsonPayload": {
    "agent_name": "support-agent",
    "task_type": "ticket_resolution",
    "total_tokens": 3500,
    "total_cost_usd": 0.0025,
    "tool_calls": ["search_kb", "update_ticket"],
    "guardrail_result": "pass",
    "duration_ms": 2800
  }
}

The trace and spanId fields enable automatic correlation with Cloud Trace spans.

Log-Based Metrics

Create metrics from log patterns without code changes:

1
2
3
4
5
6
# Metric from guardrail trigger logs
resource.type="cloud_run_revision"
jsonPayload.guardrail_result="block"

# Count as custom metric: guardrail_blocks_total
# Label: jsonPayload.guardrail_name

Log-based metrics feed into Cloud Monitoring dashboards and alert policies, bridging logging and monitoring without custom instrumentation code.

Audit Logging

For compliance, ensure Cloud Audit Logs are enabled for:

  • Data Access logs — who queried which agent, when
  • Admin Activity logs — who deployed/updated agent configurations
  • Agent Engine API calls — all API calls to the Agent Engine service

Retain audit logs per your compliance requirements (default 30 days for Data Access, 400 days for Admin Activity).


Looker Dashboards

Build four dashboards in Looker (or Looker Studio) for different audiences:

1. Real-Time Operations Dashboard

Audience: On-call engineers, SRE team

Panel Data Source Visualization
Active agents (current) Cloud Monitoring Counter
Request rate (QPS) Cloud Monitoring Time series
P50/P95/P99 latency Cloud Monitoring Time series with thresholds
Error rate Cloud Monitoring Time series with alert overlay
Active alerts Cloud Monitoring Alert list

2. Cost Tracking Dashboard

Audience: Engineering leads, FinOps team

Panel Data Source Visualization
Daily spend by model Cloud Billing + custom metrics Stacked bar chart
Cost per task by agent Custom metrics Table, sortable
Token consumption trend Cloud Monitoring Time series
Cost anomalies Custom metrics + ML-based alerting Alert list
Budget burn rate Cloud Billing Gauge (% of monthly budget)

3. Quality Metrics Dashboard

Audience: Product team, agent developers

Panel Data Source Visualization
Task completion rate by agent Custom metrics Time series per agent
Guardrail trigger rate Custom metrics / log-based metrics Time series by guardrail type
Eval regression trend Langfuse (or custom pipeline) Weekly trend line
User feedback scores Custom metrics Distribution histogram
Drift detection signals Langfuse / custom pipeline Alert list

4. Agent Deep-Dive Dashboard

Audience: Agent developers (per-agent view)

Panel Data Source Visualization
Trace waterfall (sample) Cloud Trace Embedded trace view
Tool call success rate Custom metrics Bar chart by tool
Token distribution per step Custom metrics Box plot
Session replay link Langfuse Deep link
Recent failures (table) Cloud Logging Log table, filtered by error

Integration with OSS Tools

GCP-native observability covers infrastructure and model-level metrics. For AI-specific features (evals, prompt management, session replay), integrate Langfuse alongside GCP services:

Architecture: GCP + Langfuse

1
2
3
4
5
Agent Code
├── OTel SDK ──────────> OTel Collector ──> Cloud Trace (GCP-native)
│                                      └──> Langfuse (AI-specific)
├── Langfuse SDK ──────> Langfuse directly (evals, prompts, scores)
└── Cloud Logging SDK ─> Cloud Logging (audit, structured logs)

OTel Collector Configuration

Deploy an OTel Collector on GKE that fans out traces to both Cloud Trace and Langfuse:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  googlecloud:
    project: my-gcp-project
  otlphttp:
    endpoint: https://langfuse.internal.mms.com/api/public/otel
    headers:
      Authorization: "Basic ${LANGFUSE_API_KEY}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [googlecloud, otlphttp]

This gives you:

  • Cloud Trace for GCP-native correlation (traces linked to Cloud Run, Vertex AI, Cloud SQL)
  • Langfuse for AI-specific analysis (eval scores on traces, prompt versioning, cost per conversation, session replay)

What Lives Where

Concern GCP-Native Langfuse
Infrastructure metrics (CPU, GPU, pods) Cloud Monitoring
Distributed traces Cloud Trace Langfuse (duplicate for AI analysis)
Model metrics (TTFT, tokens, errors) Cloud Monitoring Langfuse (richer per-trace)
Eval scores Langfuse
Prompt versioning Langfuse
Session replay Langfuse
Cost per task Custom metrics Langfuse (automatic)
Audit logs Cloud Audit Logs
Alerts Cloud Monitoring (PagerDuty) Langfuse (webhooks)

References

This post is licensed under CC BY 4.0 by the author.