How to implement observability and monitoring for Google ADK agents (AgentOps, Arize, Phoenix, Cloud Trace)?

Question

Accepted Answer

## Observability & Monitoring for Google ADK Agents

Monitoring AI agents in production is critical. Google ADK integrates with **8+ observability platforms** for tracing, debugging, and performance monitoring.

---

## Observability Options

```mermaid
graph TD
    A[ADK Agent] --> B[Built-in Tracing]
    A --> C[Cloud-Native]
    A --> D[Third-Party]
    B --> B1[ADK Web UI Trace]
    C --> C1[Google Cloud Trace]
    D --> D1[AgentOps]
    D --> D2[Arize AX]
    D --> D3[Galileo]
    D --> D4[Phoenix]
    D --> D5[Monocle]
    D --> D6[MLflow]
    D --> D7[W&B Weave]
```

---

## 1. Built-in ADK Tracing

![ADK Trace UI](https://google.github.io/adk-docs/assets/adk-web-dev-ui-trace.png)

```bash
# Run with web UI to see traces
adk web my_agent
# Click on any conversation to see the trace view
```

The trace view shows:
- Agent execution timeline
- LLM calls with input/output
- Tool calls with arguments and results
- Sub-agent delegations
- Token usage per call

---

## 2. Google Cloud Trace

![Cloud Trace](https://google.github.io/adk-docs/assets/cloud-trace1.png)

```python
from google.adk.plugins import CloudTracePlugin

runner = Runner(
    agent=root_agent,
    app_name="my_app",
    session_service=session_service,
    plugins=[CloudTracePlugin(project_id="my-gcp-project")],
)
```

---

## 3. AgentOps

```python
import agentops

# Initialize AgentOps
agentops.init(api_key="your-agentops-key")

# ADK automatically sends traces to AgentOps
runner = Runner(
    agent=root_agent,
    app_name="my_app",
    session_service=session_service,
)
```

---

## 4. Custom Monitoring via Plugins

```python
import time
import logging

logger = logging.getLogger("adk_monitor")

class MonitoringPlugin:
    def __init__(self):
        self.total_tokens = 0
        self.total_tool_calls = 0
        self.total_errors = 0

def before_agent(self, ctx, *args):
        ctx.state["temp:start_time"] = time.time()
        logger.info(f"Agent {ctx.agent_name} started")

def after_agent(self, ctx, *args):
        duration = time.time() - ctx.state.get("temp:start_time", 0)
        logger.info(f"Agent {ctx.agent_name} completed in {duration:.2f}s")

def after_model(self, ctx, response, *args):
        tokens = response.usage_metadata.total_token_count
        self.total_tokens += tokens
        logger.info(f"LLM call: {tokens} tokens (total: {self.total_tokens})")

def before_tool(self, ctx, tool_call, *args):
        self.total_tool_calls += 1
        logger.info(f"Tool call #{self.total_tool_calls}: {tool_call.function_call.name}")

runner = Runner(
    agent=root_agent,
    app_name="my_app",
    session_service=session_service,
    plugins=[MonitoringPlugin()],
)
```

---

## Key Metrics to Monitor

| Metric | Why |
|--------|-----|
| **Latency** | End-to-end response time |
| **Token usage** | Cost tracking |
| **Tool call frequency** | Identify bottleneck tools |
| **Error rate** | Detect failures early |
| **Hallucination rate** | Quality monitoring |
| **Safety violations** | Compliance |
| **Session length** | User engagement |

Learn more at [Integrations](https://google.github.io/adk-docs/integrations/) and [Callbacks](https://google.github.io/adk-docs/callbacks/).

How to implement observability and monitoring for Google ADK agents (AgentOps, Arize, Phoenix, Cloud Trace)?

Answer

Observability & Monitoring for Google ADK Agents

Observability Options

1. Built-in ADK Tracing

2. Google Cloud Trace

3. AgentOps

4. Custom Monitoring via Plugins

Key Metrics to Monitor

Related Concepts

How would you monitor a deployed LLM application?

What's your strategy for handling model updates in production?

How would you reduce inference latency for an LLM application?

How would you estimate costs for a large-scale LLM application?

What's your testing strategy for Gen AI applications?

Metric	Why
Latency	End-to-end response time
Token usage	Cost tracking
Tool call frequency	Identify bottleneck tools
Error rate	Detect failures early
Hallucination rate	Quality monitoring
Safety violations	Compliance
Session length	User engagement