Skip to content

Realtime

The Realtime API provisions a session ticket via HTTP, then connects the client to the gateway’s WebSocket multiplexer; the gateway opens an upstream WS to the provider and pipes frames bidirectionally. The client never holds a credential for the upstream provider.

Required capability: realtime

Provisioning flow

1. Client → POST /v1/realtime/sessions (HTTP, with Authorization)
Returns { id, ticket, gateway_ws_url, subprotocol_hint, … }
2. Client → WebSocket gateway_ws_url
Sec-WebSocket-Protocol: ticket.<value>
3. Gateway redeems the ticket (atomic, single-use, 60s TTL)
4. Gateway opens upstream WS via the adapter (OpenAI realtime / Gemini Live / etc.)
5. Frames flow: client ⇄ gateway ⇄ upstream

POST /v1/realtime/sessions

Terminal window
curl https://your-gateway.example.com/v1/realtime/sessions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o-realtime-preview-2024-10-01",
"modalities": ["text", "audio"],
"voice": "alloy",
"instructions": "You are a friendly assistant.",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"idle_timeout_seconds": 600
}'

Request fields

FieldTypeRequiredDescription
providerstringYesProvider slug. Today: openai. (Gemini, ElevenLabs Convai, etc. follow when their adapters implement buildRealtimeUpstream.)
modelstringYesProvider model id.
modalitiesstring[]Yes["text"], ["audio"], or ["text", "audio"].
voicestringNoProvider voice identifier.
instructionsstringNoSystem-style instructions.
input_audio_formatstringNopcm16 | g711_ulaw | g711_alaw.
output_audio_formatstringNoSame set.
toolsarrayNoFunction / built-in tools available during the session.
idle_timeout_secondsintegerNoAuto-terminate after N seconds of silence. Default: 600. Min: 30. Max: 3600.
metadataobjectNoFree-form tenant metadata.

Response

{
"id": "rt-7c8b9d3e-...",
"object": "realtime.session",
"provider": "openai",
"model": "gpt-4o-realtime-preview-2024-10-01",
"status": "connecting",
"gateway_ws_url": "wss://your-gateway.example.com/v1/realtime/connect",
"subprotocol_hint": "ticket.eyJh...",
"ticket": "eyJh...",
"ticket_expires_at": 1748284860,
"created_at": 1748284800,
"metadata": null
}

The ticket is single-use and expires after 60 seconds. Open the WS within that window.

WebSocket connection

const ws = new WebSocket(session.gateway_ws_url, [`ticket.${session.ticket}`]);
ws.onopen = () => {
ws.send(JSON.stringify({
type: 'session.update',
session: { instructions: 'Be concise.' }
}));
};
ws.onmessage = (event) => {
// Frames come through verbatim from the provider — see the provider's
// own realtime protocol docs (OpenAI's "Realtime API" for the openai
// adapter) for the event taxonomy.
};

The gateway:

  • Validates the ticket atomically (rejects with 4401 close code on invalid / expired / already-redeemed tickets).
  • Resets a sliding idle timer on every frame in either direction.
  • Tracks audit aggregates (input/output tokens from JSON events, audio seconds from binary frames at the configured PCM rate).
  • Enforces a per-tenant concurrency cap (default 10 connecting+connected sessions). New sessions over the cap fail with RATE_LIMIT_EXCEEDED.

Session inspection

Terminal window
curl https://your-gateway.example.com/v1/realtime/sessions/rt-abc \
-H "Authorization: Bearer aigw_sk_your_api_key"

Returns the current session record including final audit aggregates once the session is closed.

Connection close

The gateway closes both sides when:

  • Either side sends a close frame
  • The idle timer fires (no frames in either direction for idle_timeout_seconds)
  • The upstream connection errors
  • The gateway process receives SIGTERM / SIGINT (graceful drain)