Realtime
The Realtime API provisions a session ticket via HTTP, then connects the client to the gateway’s WebSocket multiplexer; the gateway opens an upstream WS to the provider and pipes frames bidirectionally. The client never holds a credential for the upstream provider.
Required capability: realtime
Provisioning flow
1. Client → POST /v1/realtime/sessions (HTTP, with Authorization) ↓ Returns { id, ticket, gateway_ws_url, subprotocol_hint, … } ↓2. Client → WebSocket gateway_ws_url Sec-WebSocket-Protocol: ticket.<value> ↓3. Gateway redeems the ticket (atomic, single-use, 60s TTL)4. Gateway opens upstream WS via the adapter (OpenAI realtime / Gemini Live / etc.)5. Frames flow: client ⇄ gateway ⇄ upstreamPOST /v1/realtime/sessions
curl https://your-gateway.example.com/v1/realtime/sessions \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "provider": "openai", "model": "gpt-4o-realtime-preview-2024-10-01", "modalities": ["text", "audio"], "voice": "alloy", "instructions": "You are a friendly assistant.", "input_audio_format": "pcm16", "output_audio_format": "pcm16", "idle_timeout_seconds": 600 }'Request fields
| Field | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | Provider slug. Today: openai. (Gemini, ElevenLabs Convai, etc. follow when their adapters implement buildRealtimeUpstream.) |
model | string | Yes | Provider model id. |
modalities | string[] | Yes | ["text"], ["audio"], or ["text", "audio"]. |
voice | string | No | Provider voice identifier. |
instructions | string | No | System-style instructions. |
input_audio_format | string | No | pcm16 | g711_ulaw | g711_alaw. |
output_audio_format | string | No | Same set. |
tools | array | No | Function / built-in tools available during the session. |
idle_timeout_seconds | integer | No | Auto-terminate after N seconds of silence. Default: 600. Min: 30. Max: 3600. |
metadata | object | No | Free-form tenant metadata. |
Response
{ "id": "rt-7c8b9d3e-...", "object": "realtime.session", "provider": "openai", "model": "gpt-4o-realtime-preview-2024-10-01", "status": "connecting", "gateway_ws_url": "wss://your-gateway.example.com/v1/realtime/connect", "subprotocol_hint": "ticket.eyJh...", "ticket": "eyJh...", "ticket_expires_at": 1748284860, "created_at": 1748284800, "metadata": null}The ticket is single-use and expires after 60 seconds. Open the WS within that window.
WebSocket connection
const ws = new WebSocket(session.gateway_ws_url, [`ticket.${session.ticket}`]);ws.onopen = () => { ws.send(JSON.stringify({ type: 'session.update', session: { instructions: 'Be concise.' } }));};ws.onmessage = (event) => { // Frames come through verbatim from the provider — see the provider's // own realtime protocol docs (OpenAI's "Realtime API" for the openai // adapter) for the event taxonomy.};The gateway:
- Validates the ticket atomically (rejects with
4401close code on invalid / expired / already-redeemed tickets). - Resets a sliding idle timer on every frame in either direction.
- Tracks audit aggregates (input/output tokens from JSON events, audio seconds from binary frames at the configured PCM rate).
- Enforces a per-tenant concurrency cap (default 10 connecting+connected
sessions). New sessions over the cap fail with
RATE_LIMIT_EXCEEDED.
Session inspection
curl https://your-gateway.example.com/v1/realtime/sessions/rt-abc \ -H "Authorization: Bearer aigw_sk_your_api_key"Returns the current session record including final audit aggregates once the session is closed.
Connection close
The gateway closes both sides when:
- Either side sends a close frame
- The idle timer fires (no frames in either direction for
idle_timeout_seconds) - The upstream connection errors
- The gateway process receives
SIGTERM/SIGINT(graceful drain)