How to deploy Google ADK agents to production (Vertex AI Agent Engine, Cloud Run, GKE)?

Question

Accepted Answer

## Deploying ADK Agents to Production

![Deploy Agent](https://google.github.io/adk-docs/assets/deploy-agent.png)

---

## Deployment Decision Matrix

```mermaid
graph TD
    A[Production Deployment] --> B{Requirements?}
    B -->|Fully managed, auto-scale| C[Vertex AI Agent Engine]
    B -->|Serverless, cost-efficient| D[Cloud Run]
    B -->|Custom infra, max control| E[GKE]
    B -->|On-premise| F[Docker / Podman]
```

---

## 1. Vertex AI Agent Engine

The recommended fully-managed option:

```python
from google.adk.agents import Agent

agent = Agent(
    name="prod_agent",
    model="gemini-2.5-flash",
    instruction="You are a customer support agent.",
    tools=[search_kb, create_ticket],
)
```

```bash
# Deploy via CLI
gcloud ai agents deploy \
    --project=my-project \
    --region=us-central1 \
    --display-name="Customer Support Agent" \
    --agent-dir=./my_agent
```

| Feature | Details |
|---------|---------|
| Auto-scaling | Scales to zero and up based on traffic |
| Managed sessions | Built-in session persistence |
| Monitoring | Cloud Logging + Cloud Trace |
| Auth | IAM-based authentication |

---

## 2. Cloud Run (Serverless)

```python
# Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["adk", "api_server", "--port", "8080", "my_agent"]
```

```bash
# Build and push
docker build -t gcr.io/PROJECT/my-agent .
docker push gcr.io/PROJECT/my-agent

# Deploy
gcloud run deploy my-agent \
    --image gcr.io/PROJECT/my-agent \
    --platform managed \
    --region us-central1 \
    --memory 2Gi \
    --set-env-vars GOOGLE_API_KEY=your-key \
    --min-instances 1 \
    --max-instances 10
```

---

## 3. GKE (Kubernetes)

![GKE Deployment](https://google.github.io/adk-docs/assets/agent-gke-deployment.png)

```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: adk-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: adk-agent
  template:
    spec:
      containers:
      - name: agent
        image: gcr.io/PROJECT/my-agent:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        env:
        - name: GOOGLE_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-keys
              key: google-api-key
---
apiVersion: v1
kind: Service
metadata:
  name: adk-agent-svc
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: adk-agent
```

---

## Production Checklist

- [ ] Use `DatabaseSessionService` or `VertexAiSessionService` (not in-memory)
- [ ] Set `max_llm_calls` in RunConfig to prevent runaway loops
- [ ] Enable HITL for destructive tool operations
- [ ] Add observability (Cloud Trace, AgentOps, or Phoenix)
- [ ] Store secrets in Secret Manager (not env vars)
- [ ] Set up evaluation CI/CD pipeline
- [ ] Configure auto-scaling and health checks
- [ ] Enable structured logging
- [ ] Add rate limiting for API endpoints

Learn more at [Deployment](https://google.github.io/adk-docs/deploy/).

How to deploy Google ADK agents to production (Vertex AI Agent Engine, Cloud Run, GKE)?

Answer

Deploying ADK Agents to Production

Deployment Decision Matrix

1. Vertex AI Agent Engine

2. Cloud Run (Serverless)

3. GKE (Kubernetes)

Production Checklist

Related Concepts

How would you monitor a deployed LLM application?

What's your strategy for handling model updates in production?

How would you reduce inference latency for an LLM application?

How would you estimate costs for a large-scale LLM application?

What's your testing strategy for Gen AI applications?

Feature	Details
Auto-scaling	Scales to zero and up based on traffic
Managed sessions	Built-in session persistence
Monitoring	Cloud Logging + Cloud Trace
Auth	IAM-based authentication