Skip to content

Chat Completions

Create a chat completion using any configured provider through a single OpenAI-compatible endpoint.

POST /v1/chat/completions

Required capability: chat

Request Body

ParameterTypeRequiredDescription
modelstringYesModel identifier (e.g. gpt-4, claude-3-opus, gemini-pro). The gateway routes to the appropriate provider.
messagesarrayYesArray of message objects (minimum 1). See Message Format.
temperaturenumberNoSampling temperature between 0 and 2. Higher values produce more random output.
top_pnumberNoNucleus sampling parameter between 0 and 1.
max_tokensintegerNoMaximum number of tokens to generate.
max_completion_tokensintegerNoAlternative to max_tokens. Maximum completion tokens to generate.
streambooleanNoIf true, responses are streamed as server-sent events. Defaults to false.
stopstring | string[]NoSequences where the model stops generating.
nintegerNoNumber of completions to generate (1-10).
toolsarrayNoList of tool definitions the model may call. Each entry is either a function tool (type: "function") or a gateway built-in tool (type: "built_in"). See Built-in Tools.
tool_choicestring | objectNoControls tool calling: "auto", "none", "required", or a specific function.
response_formatobjectNoOutput format: {"type": "text"}, {"type": "json_object"}, or {"type": "json_schema", "json_schema": {...}}.
reasoning_effortstringNoReasoning budget for reasoning-capable models: "minimal", "low", "medium", or "high". Passed through to the provider’s native reasoning/thinking parameter.
reasoningobjectNoFine-grained reasoning controls: { "effort": "minimal|low|medium|high", "budget_tokens": <int>, "exclude_from_output": <bool> }. The adapter maps these to the provider’s native thinking-budget mechanism.
cache_controlobjectNoPrompt-caching directives: { "cache_prefix": <bool>, "ttl_seconds": <int> }. The adapter selects the provider’s native caching mechanism.
seedintegerNoSeed for deterministic sampling (best-effort).
userstringNoEnd-user identifier for abuse tracking.
frequency_penaltynumberNoPenalty for token frequency, between -2.0 and 2.0. Positive values discourage repetition.
presence_penaltynumberNoPenalty for token presence, between -2.0 and 2.0. Positive values encourage new topics.
logprobsbooleanNoIf true, return per-token log probabilities in the response.
top_logprobsintegerNoNumber of top alternative tokens to include in logprobs, between 0 and 20. Requires logprobs: true.
logit_biasobjectNoMap of token-id (string) to bias value between -100 and 100. Adjusts the likelihood of specific tokens appearing.

Message Format

Each message object has the following structure:

FieldTypeRequiredDescription
rolestringYesOne of system, user, assistant, or tool.
contentstring | array | nullYesThe message content. Can be a string, an array of content parts (text or image_url), or null for assistant messages with tool calls.
namestringNoName of the message author.
tool_callsarrayNoTool calls made by the assistant. Each has id, type: "function", and function: {name, arguments}.
tool_call_idstringNoFor tool role messages, the ID of the tool call being responded to.

Multimodal Content

The content field can be an array of content parts for vision requests:

[
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url", "image_url": { "url": "https://example.com/image.png", "detail": "auto" } }
]

The detail field accepts "auto", "low", or "high".

Response

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709000000,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}

Streaming

Set "stream": true to receive responses as server-sent events (SSE). The response uses Content-Type: text/event-stream.

Each event is a JSON object prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: [DONE]

The stream terminates with data: [DONE].

Headers

The response includes:

HeaderDescription
X-Request-IDUnique identifier for the request, useful for debugging and audit trails.
X-ProviderThe provider the request was routed to (e.g. openai, anthropic).
X-ModelThe resolved model that served the request.

Example

Terminal window
curl https://your-gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one sentence."}
],
"temperature": 0.7,
"max_tokens": 100
}'

Streaming Example

Terminal window
curl https://your-gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Write a haiku about APIs."}],
"stream": true
}'

Tool Calling Example

Terminal window
curl https://your-gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "What is the weather in London?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'

Built-in Tools

In addition to function tools, the gateway accepts built-in tools that map to provider-native capabilities. Each built-in tool uses type: "built_in" with a built_in name and an optional config object:

built_inDescription
file_searchGateway-side retrieval over your vector stores. The gateway retrieves matching chunks and injects them as system context, then strips the tool before the request reaches the provider. Pass config: { "vectorStoreIds": [...], "topK": <int> }.
web_searchMaps to the provider’s native web-search tool (where supported, e.g. Anthropic, Gemini).
code_executionMaps to the provider’s native code-execution tool.
computer_useMaps to the provider’s native computer-use tool.
search_groundingMaps to Gemini’s search-grounding capability.
{
"model": "gpt-4o",
"messages": [{ "role": "user", "content": "Summarize our onboarding policy." }],
"tools": [
{
"type": "built_in",
"built_in": "file_search",
"config": { "vectorStoreIds": ["vs_abc123"], "topK": 5 }
}
]
}

The file_search built-in is handled entirely by the gateway and works with any chat-capable provider. The remaining built-ins are forwarded to providers that support the corresponding native tool.

Gateway Features

The chat completions endpoint passes through the full gateway middleware pipeline:

  • Request normalization — requests are translated to a unified internal format, enabling routing to any provider.
  • RAG injection — if RAG is configured for the tenant, relevant documents are injected into the conversation context.
  • Prompt guards — configurable content filters for injection detection, PII, and toxicity.
  • Budget checks — requests are blocked if the tenant or API key budget has been exceeded.
  • Semantic cache — identical or semantically similar requests may be served from cache.
  • Usage tracking — token usage is recorded for billing and analytics.
  • Audit logging — requests are logged for compliance.