Explain the Gemini Live API streaming capabilities in Google ADK for real-time voice/video.

#google-adk#streaming#gemini-live#voice#video#real-time

Answer

Streaming & Gemini Live API in Google ADK

ADK Streaming
ADK Streaming

Google ADK integrates with the Gemini Live API for low-latency, bidirectional voice and video streaming, enabling real-time conversational AI.


Streaming Architecture


Capabilities

FeatureDescription
Audio InputReal-time microphone streaming
Audio OutputVoice response generation
Video InputCamera/screen sharing analysis
BidirectionalFull-duplex communication
InterruptionUser can interrupt mid-response
Tool CallingTools work during streaming
TranscriptionReal-time audio transcription

Audio Streaming Agent

Streaming Audio Dialog
Streaming Audio Dialog

python
from google.adk.agents import Agent
from google.adk.runners import Runner, RunConfig

agent = Agent(
    name="voice_assistant",
    model="gemini-2.5-flash",
    instruction="You are a voice assistant. Respond naturally and conversationally.",
    tools=[get_weather, search_web],
)

# Enable streaming
run_config = RunConfig(
    streaming_mode="SSE",
    response_modalities=["AUDIO"],
    output_audio_transcription=True,
    speech_config={
        "voice_config": {
            "prebuilt_voice_config": {
                "voice_name": "Charon"
            }
        }
    }
)

Running Streaming in Web UI

Streaming Mic
Streaming Mic

bash
# Launch with streaming support
adk web my_agent
# Click the microphone icon to start voice streaming

Text Streaming (SSE)

python
from google.adk.runners import Runner, RunConfig

config = RunConfig(streaming_mode="SSE")

async for event in runner.run_async(
    user_id="user-1",
    session_id=session.id,
    new_message="Explain quantum computing",
    run_config=config,
):
    # Events arrive as they're generated
    if event.content and event.content.parts:
        for part in event.content.parts:
            if part.text:
                print(part.text, end="", flush=True)

Streaming with Tools


Streaming vs Non-Streaming

FeatureNon-StreamingStreaming (SSE)Gemini Live
ResponseFull at onceToken by tokenReal-time audio/video
LatencyHigh (wait for full)Low (incremental)Lowest (bidirectional)
ModalityTextTextText + Audio + Video
InterruptionNot possibleNot possibleSupported

Learn more at Streaming.