GCP AI Observability Stack
How to set up AI observability using GCP-native services -- Vertex AI Agent Engine telemetry, ADK instrumentation, Cloud Trace, Cloud Monitoring, Cloud Logging, and Looker dashboards -- and how to integrate open-source tools alongside them.
MMS builds on GCP. This file covers how to set up AI observability using GCP-native services — Vertex AI Agent Engine telemetry, ADK instrumentation, Cloud Trace, Cloud Monitoring, Cloud Logging, and Looker dashboards — and how to integrate open-source tools (Langfuse, OpenLLMetry) alongside them.
GCP AI Observability Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌─────────────────────────────────────────────────────────────────┐
│ Agent Code (ADK / LangGraph / Custom) │
│ │
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
│ │ OTel SDK │ │ Langfuse SDK (optional) │ │
│ │ GenAI conventions │ │ Evals, prompt mgmt, cost │ │
│ └─────────┬───────────┘ └──────────────┬──────────────────┘ │
└────────────┼─────────────────────────────┼──────────────────────┘
│ │
v v
┌────────────────────────┐ ┌─────────────────────────────────┐
│ OTel Collector │ │ Langfuse (self-hosted on GKE) │
│ (Cloud Run / GKE) │ │ ClickHouse + Redis + S3 │
└──────┬─────┬───────────┘ └─────────────────────────────────┘
│ │
│ └──────────────────────┐
v v
┌──────────────┐ ┌──────────────────────────┐ ┌───────────────┐
│ Cloud Trace │ │ Cloud Monitoring │ │ Cloud Logging │
│ (traces) │ │ (metrics + alerts) │ │ (structured) │
└──────┬───────┘ └──────────┬───────────────┘ └───────┬───────┘
│ │ │
└─────────┬───────────┘ │
v │
┌─────────────────────┐ │
│ Looker Dashboards │<─────────────────────────┘
│ (unified view) │
└─────────────────────┘
Two complementary paths:
- GCP-native (Cloud Trace + Monitoring + Logging) — zero-config for Vertex AI Agent Engine, auto-correlated across GCP services, integrated with IAM and audit logging
- OSS (Langfuse + OpenLLMetry) — richer AI-specific features (evals, prompt management, session replay), self-hosted for data sovereignty
Both paths use OTel as the common transport. You can run them in parallel — GCP-native for infrastructure correlation and Langfuse for AI-specific analysis.
Vertex AI Agent Engine Telemetry
When agents are deployed on Vertex AI Agent Engine, several observability features are built in:
What You Get Out of the Box
| Feature | Service | Details |
|---|---|---|
| Distributed traces | Cloud Trace | Agent invocation as root span, LLM/tool calls as children. OTel-compatible. |
| Request metrics | Cloud Monitoring | QPS, token throughput, TTFT (time-to-first-token), error rate |
| Structured logs | Cloud Logging | Request/response logs with trace ID correlation |
| Model observability dashboard | Vertex AI console | Token usage, latency distributions, error breakdowns per model |
| Cost tracking | Cloud Billing | Per-model cost, but not per-agent or per-task (requires custom metrics) |
Enabling Telemetry
Telemetry is enabled via environment variables on the Agent Engine runtime:
| Environment Variable | Value | What It Enables |
|---|---|---|
GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY |
true |
Traces and logs (without prompt/response content) |
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT |
true |
Full prompt and response content in traces (opt-in, security-sensitive) |
GOOGLE_CLOUD_PROJECT |
your-project-id |
GCP project for trace/metric export (auto-detected in Agent Engine) |
Important: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true logs full prompts and responses. Do not enable this in production if traces contain PII or sensitive customer data unless you have appropriate data retention and access controls.
ADK Instrumentation
The Google Agent Development Kit has built-in OTel support for Cloud Trace:
Option 1: Via AdkApp Abstraction
1
2
3
4
5
6
from google.adk.app import AdkApp
app = AdkApp(
agent=my_agent,
enable_tracing=True # Sends traces to Cloud Trace
)
Option 2: Via Telemetry Module Directly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from google.adk.telemetry import google_cloud
# Get GCP-specific OTel exporters
exporters = google_cloud.get_gcp_exporters(
enable_cloud_tracing=True
)
# Configure OTel SDK with these exporters
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider()
for exporter in exporters:
provider.add_span_processor(BatchSpanProcessor(exporter))
Option 3: Via Environment Variables (Zero-Code)
For agents deployed on Vertex AI Agent Engine or Cloud Run:
1
2
export GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
No code changes required. The ADK runtime auto-configures OTel exporters to Cloud Trace.
LangChain/LangGraph on Vertex AI
If using LangChain or LangGraph within Vertex AI Agent Engine:
1
2
3
4
5
6
from langchain.callbacks.tracers.langchain import wait_for_all_tracers
# Enable tracing via LangSmith (if available)
# Or use OpenLLMetry for OTel-native tracing to Cloud Trace
from traceloop.sdk import Traceloop
Traceloop.init(exporter=cloud_trace_exporter)
Cloud Trace for AI Agents
How Agent Traces Appear
An ADK agent deployed on Vertex AI Agent Engine produces traces with this hierarchy:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Trace: /agent/invoke
│
├── Span: agent_engine.invoke [root]
│ service.name: "agent-engine"
│ agent.name: "support-agent"
│
│ ├── Span: adk.agent.run [agent execution]
│ │ gen_ai.agent.name: "support-agent"
│ │
│ │ ├── Span: gen_ai.chat [LLM call 1]
│ │ │ gen_ai.system: "vertex_ai"
│ │ │ gen_ai.request.model: "gemini-2.0-flash"
│ │ │ gen_ai.usage.input_tokens: 950
│ │ │ gen_ai.usage.output_tokens: 180
│ │ │
│ │ ├── Span: tool.execute [tool call]
│ │ │ tool.name: "search_kb"
│ │ │ tool.status: "ok"
│ │ │
│ │ └── Span: gen_ai.chat [LLM call 2]
│ │ gen_ai.usage.input_tokens: 2100
│ │ gen_ai.usage.output_tokens: 420
│ │
│ └── Span: guardrail.check [output validation]
│ guardrail.name: "content_filter"
│ guardrail.result: "pass"
Cross-Service Trace Propagation
Cloud Trace automatically correlates spans from different GCP services in the same trace:
- Cloud Run — agent service spans
- Cloud Functions — tool execution spans (if tools are serverless)
- Vertex AI — model inference spans
- Cloud SQL / Firestore — data access spans (if using GCP client libraries with tracing enabled)
W3C traceparent header propagation handles this automatically when using GCP client libraries.
Cloud Monitoring Metrics and Alerts
Built-in Vertex AI Metrics
| Metric | Description | Useful For |
|---|---|---|
aiplatform.googleapis.com/prediction/online/response_count |
Total prediction requests | Traffic volume |
aiplatform.googleapis.com/prediction/online/response_latencies |
Response latency distribution | P50/P95/P99 latency tracking |
aiplatform.googleapis.com/prediction/online/error_count |
Failed predictions | Error rate alerting |
aiplatform.googleapis.com/publisher/online_serving/token_count |
Token consumption | Usage and cost tracking |
aiplatform.googleapis.com/publisher/online_serving/first_token_latencies |
Time-to-first-token | Streaming performance |
Custom Metrics for AI Agents
Built-in metrics cover model-level observability. For agent-level insights, create custom metrics:
1
2
3
4
5
6
7
8
9
10
from google.cloud import monitoring_v3
from google.api import metric_pb2
# Custom metric: cost per agent task
client = monitoring_v3.MetricServiceClient()
series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/agent/task_cost_usd"
series.metric.labels["agent_name"] = "support-agent"
series.metric.labels["task_type"] = "ticket_resolution"
# ... write data point
Recommended custom metrics:
| Custom Metric | Labels | Purpose |
|---|---|---|
agent/task_cost_usd |
agent_name, task_type | Per-task cost attribution |
agent/task_completion_rate |
agent_name, task_type | Quality tracking |
agent/guardrail_triggers |
agent_name, guardrail_name, result | Safety monitoring |
agent/tool_call_duration |
agent_name, tool_name, status | Tool performance |
agent/tokens_per_task |
agent_name, task_type | Efficiency tracking |
Alert Policies
| Alert | Condition | Notification |
|---|---|---|
| TTFT > 5s (P95) | first_token_latencies P95 > 5000ms for 5 min |
PagerDuty + Slack |
| Error rate > 5% | error_count / response_count > 0.05 for 5 min |
PagerDuty |
| Token spike | token_count > 3x daily average |
Slack (cost alert) |
| Guardrail surge | guardrail_triggers > 2x hourly baseline |
Slack (quality alert) |
| Task completion drop | task_completion_rate < 85% over 1-hour window |
Slack + weekly report |
Cloud Logging Patterns
Structured Log Entries
ADK and Vertex AI Agent Engine emit structured JSON logs to Cloud Logging. Key fields:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"severity": "INFO",
"message": "Agent task completed",
"logging.googleapis.com/trace": "projects/my-project/traces/abc123",
"logging.googleapis.com/spanId": "def456",
"jsonPayload": {
"agent_name": "support-agent",
"task_type": "ticket_resolution",
"total_tokens": 3500,
"total_cost_usd": 0.0025,
"tool_calls": ["search_kb", "update_ticket"],
"guardrail_result": "pass",
"duration_ms": 2800
}
}
The trace and spanId fields enable automatic correlation with Cloud Trace spans.
Log-Based Metrics
Create metrics from log patterns without code changes:
1
2
3
4
5
6
# Metric from guardrail trigger logs
resource.type="cloud_run_revision"
jsonPayload.guardrail_result="block"
# Count as custom metric: guardrail_blocks_total
# Label: jsonPayload.guardrail_name
Log-based metrics feed into Cloud Monitoring dashboards and alert policies, bridging logging and monitoring without custom instrumentation code.
Audit Logging
For compliance, ensure Cloud Audit Logs are enabled for:
- Data Access logs — who queried which agent, when
- Admin Activity logs — who deployed/updated agent configurations
- Agent Engine API calls — all API calls to the Agent Engine service
Retain audit logs per your compliance requirements (default 30 days for Data Access, 400 days for Admin Activity).
Looker Dashboards
Build four dashboards in Looker (or Looker Studio) for different audiences:
1. Real-Time Operations Dashboard
Audience: On-call engineers, SRE team
| Panel | Data Source | Visualization |
|---|---|---|
| Active agents (current) | Cloud Monitoring | Counter |
| Request rate (QPS) | Cloud Monitoring | Time series |
| P50/P95/P99 latency | Cloud Monitoring | Time series with thresholds |
| Error rate | Cloud Monitoring | Time series with alert overlay |
| Active alerts | Cloud Monitoring | Alert list |
2. Cost Tracking Dashboard
Audience: Engineering leads, FinOps team
| Panel | Data Source | Visualization |
|---|---|---|
| Daily spend by model | Cloud Billing + custom metrics | Stacked bar chart |
| Cost per task by agent | Custom metrics | Table, sortable |
| Token consumption trend | Cloud Monitoring | Time series |
| Cost anomalies | Custom metrics + ML-based alerting | Alert list |
| Budget burn rate | Cloud Billing | Gauge (% of monthly budget) |
3. Quality Metrics Dashboard
Audience: Product team, agent developers
| Panel | Data Source | Visualization |
|---|---|---|
| Task completion rate by agent | Custom metrics | Time series per agent |
| Guardrail trigger rate | Custom metrics / log-based metrics | Time series by guardrail type |
| Eval regression trend | Langfuse (or custom pipeline) | Weekly trend line |
| User feedback scores | Custom metrics | Distribution histogram |
| Drift detection signals | Langfuse / custom pipeline | Alert list |
4. Agent Deep-Dive Dashboard
Audience: Agent developers (per-agent view)
| Panel | Data Source | Visualization |
|---|---|---|
| Trace waterfall (sample) | Cloud Trace | Embedded trace view |
| Tool call success rate | Custom metrics | Bar chart by tool |
| Token distribution per step | Custom metrics | Box plot |
| Session replay link | Langfuse | Deep link |
| Recent failures (table) | Cloud Logging | Log table, filtered by error |
Integration with OSS Tools
GCP-native observability covers infrastructure and model-level metrics. For AI-specific features (evals, prompt management, session replay), integrate Langfuse alongside GCP services:
Architecture: GCP + Langfuse
1
2
3
4
5
Agent Code
├── OTel SDK ──────────> OTel Collector ──> Cloud Trace (GCP-native)
│ └──> Langfuse (AI-specific)
├── Langfuse SDK ──────> Langfuse directly (evals, prompts, scores)
└── Cloud Logging SDK ─> Cloud Logging (audit, structured logs)
OTel Collector Configuration
Deploy an OTel Collector on GKE that fans out traces to both Cloud Trace and Langfuse:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
googlecloud:
project: my-gcp-project
otlphttp:
endpoint: https://langfuse.internal.mms.com/api/public/otel
headers:
Authorization: "Basic ${LANGFUSE_API_KEY}"
service:
pipelines:
traces:
receivers: [otlp]
exporters: [googlecloud, otlphttp]
This gives you:
- Cloud Trace for GCP-native correlation (traces linked to Cloud Run, Vertex AI, Cloud SQL)
- Langfuse for AI-specific analysis (eval scores on traces, prompt versioning, cost per conversation, session replay)
What Lives Where
| Concern | GCP-Native | Langfuse |
|---|---|---|
| Infrastructure metrics (CPU, GPU, pods) | Cloud Monitoring | — |
| Distributed traces | Cloud Trace | Langfuse (duplicate for AI analysis) |
| Model metrics (TTFT, tokens, errors) | Cloud Monitoring | Langfuse (richer per-trace) |
| Eval scores | — | Langfuse |
| Prompt versioning | — | Langfuse |
| Session replay | — | Langfuse |
| Cost per task | Custom metrics | Langfuse (automatic) |
| Audit logs | Cloud Audit Logs | — |
| Alerts | Cloud Monitoring (PagerDuty) | Langfuse (webhooks) |