GCP AI Observability Stack

How to set up AI observability using GCP-native services -- Vertex AI Agent Engine telemetry, ADK instrumentation, Cloud Trace, Cloud Monitoring, Cloud Logging, and Looker dashboards -- and how to integrate open-source tools alongside them.

Posted Jan 15, 2026

9 min read

MMS builds on GCP. This file covers how to set up AI observability using GCP-native services — Vertex AI Agent Engine telemetry, ADK instrumentation, Cloud Trace, Cloud Monitoring, Cloud Logging, and Looker dashboards — and how to integrate open-source tools (Langfuse, OpenLLMetry) alongside them.

GCP AI Observability Architecture

┌─────────────────────────────────────────────────────────────────┐
│  Agent Code (ADK / LangGraph / Custom)                          │
│                                                                 │
│  ┌─────────────────────┐  ┌─────────────────────────────────┐   │
│  │ OTel SDK            │  │ Langfuse SDK (optional)         │   │
│  │ GenAI conventions   │  │ Evals, prompt mgmt, cost        │   │
│  └─────────┬───────────┘  └──────────────┬──────────────────┘   │
└────────────┼─────────────────────────────┼──────────────────────┘
             │                             │
             v                             v
┌────────────────────────┐   ┌─────────────────────────────────┐
│ OTel Collector         │   │ Langfuse (self-hosted on GKE)   │
│ (Cloud Run / GKE)      │   │ ClickHouse + Redis + S3         │
└──────┬─────┬───────────┘   └─────────────────────────────────┘
       │     │
       │     └──────────────────────┐
       v                            v
┌──────────────┐  ┌──────────────────────────┐  ┌───────────────┐
│ Cloud Trace  │  │ Cloud Monitoring         │  │ Cloud Logging │
│ (traces)     │  │ (metrics + alerts)       │  │ (structured)  │
└──────┬───────┘  └──────────┬───────────────┘  └───────┬───────┘
       │                     │                          │
       └─────────┬───────────┘                          │
                 v                                      │
       ┌─────────────────────┐                          │
       │ Looker Dashboards   │<─────────────────────────┘
       │ (unified view)      │
       └─────────────────────┘

Two complementary paths:

GCP-native (Cloud Trace + Monitoring + Logging) — zero-config for Vertex AI Agent Engine, auto-correlated across GCP services, integrated with IAM and audit logging
OSS (Langfuse + OpenLLMetry) — richer AI-specific features (evals, prompt management, session replay), self-hosted for data sovereignty

Both paths use OTel as the common transport. You can run them in parallel — GCP-native for infrastructure correlation and Langfuse for AI-specific analysis.

Vertex AI Agent Engine Telemetry

When agents are deployed on Vertex AI Agent Engine, several observability features are built in:

What You Get Out of the Box

Feature	Service	Details
Distributed traces	Cloud Trace	Agent invocation as root span, LLM/tool calls as children. OTel-compatible.
Request metrics	Cloud Monitoring	QPS, token throughput, TTFT (time-to-first-token), error rate
Structured logs	Cloud Logging	Request/response logs with trace ID correlation
Model observability dashboard	Vertex AI console	Token usage, latency distributions, error breakdowns per model
Cost tracking	Cloud Billing	Per-model cost, but not per-agent or per-task (requires custom metrics)

Enabling Telemetry

Telemetry is enabled via environment variables on the Agent Engine runtime:

Environment Variable	Value	What It Enables
`GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY`	`true`	Traces and logs (without prompt/response content)
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`	`true`	Full prompt and response content in traces (opt-in, security-sensitive)
`GOOGLE_CLOUD_PROJECT`	`your-project-id`	GCP project for trace/metric export (auto-detected in Agent Engine)

Important: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true logs full prompts and responses. Do not enable this in production if traces contain PII or sensitive customer data unless you have appropriate data retention and access controls.

ADK Instrumentation

The Google Agent Development Kit has built-in OTel support for Cloud Trace:

Option 1: Via AdkApp Abstraction

        
      
from google.adk.app import AdkApp

app = AdkApp(
    agent=my_agent,
    enable_tracing=True  # Sends traces to Cloud Trace
)

Option 2: Via Telemetry Module Directly

        
      
from google.adk.telemetry import google_cloud

# Get GCP-specific OTel exporters
exporters = google_cloud.get_gcp_exporters(
    enable_cloud_tracing=True
)

# Configure OTel SDK with these exporters
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
for exporter in exporters:
    provider.add_span_processor(BatchSpanProcessor(exporter))

Option 3: Via Environment Variables (Zero-Code)

For agents deployed on Vertex AI Agent Engine or Cloud Run:

        
export GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

No code changes required. The ADK runtime auto-configures OTel exporters to Cloud Trace.

LangChain/LangGraph on Vertex AI

If using LangChain or LangGraph within Vertex AI Agent Engine:

        
      
from langchain.callbacks.tracers.langchain import wait_for_all_tracers

# Enable tracing via LangSmith (if available)
# Or use OpenLLMetry for OTel-native tracing to Cloud Trace
from traceloop.sdk import Traceloop
Traceloop.init(exporter=cloud_trace_exporter)

Cloud Trace for AI Agents

How Agent Traces Appear

An ADK agent deployed on Vertex AI Agent Engine produces traces with this hierarchy:

Trace: /agent/invoke
│
├── Span: agent_engine.invoke                    [root]
│   service.name: "agent-engine"
│   agent.name: "support-agent"
│
│   ├── Span: adk.agent.run                      [agent execution]
│   │   gen_ai.agent.name: "support-agent"
│   │
│   │   ├── Span: gen_ai.chat                    [LLM call 1]
│   │   │   gen_ai.system: "vertex_ai"
│   │   │   gen_ai.request.model: "gemini-2.0-flash"
│   │   │   gen_ai.usage.input_tokens: 950
│   │   │   gen_ai.usage.output_tokens: 180
│   │   │
│   │   ├── Span: tool.execute                    [tool call]
│   │   │   tool.name: "search_kb"
│   │   │   tool.status: "ok"
│   │   │
│   │   └── Span: gen_ai.chat                    [LLM call 2]
│   │       gen_ai.usage.input_tokens: 2100
│   │       gen_ai.usage.output_tokens: 420
│   │
│   └── Span: guardrail.check                    [output validation]
│       guardrail.name: "content_filter"
│       guardrail.result: "pass"

Cross-Service Trace Propagation

Cloud Trace automatically correlates spans from different GCP services in the same trace:

Cloud Run — agent service spans
Cloud Functions — tool execution spans (if tools are serverless)
Vertex AI — model inference spans
Cloud SQL / Firestore — data access spans (if using GCP client libraries with tracing enabled)

W3C traceparent header propagation handles this automatically when using GCP client libraries.

Cloud Monitoring Metrics and Alerts

Built-in Vertex AI Metrics

Metric	Description	Useful For
`aiplatform.googleapis.com/prediction/online/response_count`	Total prediction requests	Traffic volume
`aiplatform.googleapis.com/prediction/online/response_latencies`	Response latency distribution	P50/P95/P99 latency tracking
`aiplatform.googleapis.com/prediction/online/error_count`	Failed predictions	Error rate alerting
`aiplatform.googleapis.com/publisher/online_serving/token_count`	Token consumption	Usage and cost tracking
`aiplatform.googleapis.com/publisher/online_serving/first_token_latencies`	Time-to-first-token	Streaming performance

Custom Metrics for AI Agents

Built-in metrics cover model-level observability. For agent-level insights, create custom metrics:

        
      
from google.cloud import monitoring_v3
from google.api import metric_pb2

# Custom metric: cost per agent task
client = monitoring_v3.MetricServiceClient()
series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/agent/task_cost_usd"
series.metric.labels["agent_name"] = "support-agent"
series.metric.labels["task_type"] = "ticket_resolution"
# ... write data point

Recommended custom metrics:

Custom Metric	Labels	Purpose
`agent/task_cost_usd`	agent_name, task_type	Per-task cost attribution
`agent/task_completion_rate`	agent_name, task_type	Quality tracking
`agent/guardrail_triggers`	agent_name, guardrail_name, result	Safety monitoring
`agent/tool_call_duration`	agent_name, tool_name, status	Tool performance
`agent/tokens_per_task`	agent_name, task_type	Efficiency tracking

Alert Policies

Alert	Condition	Notification
TTFT > 5s (P95)	`first_token_latencies` P95 > 5000ms for 5 min	PagerDuty + Slack
Error rate > 5%	`error_count / response_count` > 0.05 for 5 min	PagerDuty
Token spike	`token_count` > 3x daily average	Slack (cost alert)
Guardrail surge	`guardrail_triggers` > 2x hourly baseline	Slack (quality alert)
Task completion drop	`task_completion_rate` < 85% over 1-hour window	Slack + weekly report

Cloud Logging Patterns

Structured Log Entries

ADK and Vertex AI Agent Engine emit structured JSON logs to Cloud Logging. Key fields:

        
      
{
  "severity": "INFO",
  "message": "Agent task completed",
  "logging.googleapis.com/trace": "projects/my-project/traces/abc123",
  "logging.googleapis.com/spanId": "def456",
  "jsonPayload": {
    "agent_name": "support-agent",
    "task_type": "ticket_resolution",
    "total_tokens": 3500,
    "total_cost_usd": 0.0025,
    "tool_calls": ["search_kb", "update_ticket"],
    "guardrail_result": "pass",
    "duration_ms": 2800
  }
}

The trace and spanId fields enable automatic correlation with Cloud Trace spans.

Log-Based Metrics

Create metrics from log patterns without code changes:

# Metric from guardrail trigger logs
resource.type="cloud_run_revision"
jsonPayload.guardrail_result="block"

# Count as custom metric: guardrail_blocks_total
# Label: jsonPayload.guardrail_name

Log-based metrics feed into Cloud Monitoring dashboards and alert policies, bridging logging and monitoring without custom instrumentation code.

Audit Logging

For compliance, ensure Cloud Audit Logs are enabled for:

Data Access logs — who queried which agent, when
Admin Activity logs — who deployed/updated agent configurations
Agent Engine API calls — all API calls to the Agent Engine service

Retain audit logs per your compliance requirements (default 30 days for Data Access, 400 days for Admin Activity).

Looker Dashboards

Build four dashboards in Looker (or Looker Studio) for different audiences:

1. Real-Time Operations Dashboard

Audience: On-call engineers, SRE team

Panel	Data Source	Visualization
Active agents (current)	Cloud Monitoring	Counter
Request rate (QPS)	Cloud Monitoring	Time series
P50/P95/P99 latency	Cloud Monitoring	Time series with thresholds
Error rate	Cloud Monitoring	Time series with alert overlay
Active alerts	Cloud Monitoring	Alert list

2. Cost Tracking Dashboard

Audience: Engineering leads, FinOps team

Panel	Data Source	Visualization
Daily spend by model	Cloud Billing + custom metrics	Stacked bar chart
Cost per task by agent	Custom metrics	Table, sortable
Token consumption trend	Cloud Monitoring	Time series
Cost anomalies	Custom metrics + ML-based alerting	Alert list
Budget burn rate	Cloud Billing	Gauge (% of monthly budget)

3. Quality Metrics Dashboard

Audience: Product team, agent developers

Panel	Data Source	Visualization
Task completion rate by agent	Custom metrics	Time series per agent
Guardrail trigger rate	Custom metrics / log-based metrics	Time series by guardrail type
Eval regression trend	Langfuse (or custom pipeline)	Weekly trend line
User feedback scores	Custom metrics	Distribution histogram
Drift detection signals	Langfuse / custom pipeline	Alert list

4. Agent Deep-Dive Dashboard

Audience: Agent developers (per-agent view)

Panel	Data Source	Visualization
Trace waterfall (sample)	Cloud Trace	Embedded trace view
Tool call success rate	Custom metrics	Bar chart by tool
Token distribution per step	Custom metrics	Box plot
Session replay link	Langfuse	Deep link
Recent failures (table)	Cloud Logging	Log table, filtered by error

Integration with OSS Tools

GCP-native observability covers infrastructure and model-level metrics. For AI-specific features (evals, prompt management, session replay), integrate Langfuse alongside GCP services:

Architecture: GCP + Langfuse

Agent Code
├── OTel SDK ──────────> OTel Collector ──> Cloud Trace (GCP-native)
│                                      └──> Langfuse (AI-specific)
├── Langfuse SDK ──────> Langfuse directly (evals, prompts, scores)
└── Cloud Logging SDK ─> Cloud Logging (audit, structured logs)

OTel Collector Configuration

Deploy an OTel Collector on GKE that fans out traces to both Cloud Trace and Langfuse:

        
      
# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  googlecloud:
    project: my-gcp-project
  otlphttp:
    endpoint: https://langfuse.internal.mms.com/api/public/otel
    headers:
      Authorization: "Basic ${LANGFUSE_API_KEY}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [googlecloud, otlphttp]

This gives you:

Cloud Trace for GCP-native correlation (traces linked to Cloud Run, Vertex AI, Cloud SQL)
Langfuse for AI-specific analysis (eval scores on traces, prompt versioning, cost per conversation, session replay)

What Lives Where

Concern	GCP-Native	Langfuse
Infrastructure metrics (CPU, GPU, pods)	Cloud Monitoring	—
Distributed traces	Cloud Trace	Langfuse (duplicate for AI analysis)
Model metrics (TTFT, tokens, errors)	Cloud Monitoring	Langfuse (richer per-trace)
Eval scores	—	Langfuse
Prompt versioning	—	Langfuse
Session replay	—	Langfuse
Cost per task	Custom metrics	Langfuse (automatic)
Audit logs	Cloud Audit Logs	—
Alerts	Cloud Monitoring (PagerDuty)	Langfuse (webhooks)

References

AI & Agents, AI Ops

guardrails

This post is licensed under CC BY 4.0 by the author.