Chat Completions
Create a chat completion using any configured provider through a single OpenAI-compatible endpoint.
POST /v1/chat/completionsRequired capability: chat
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier (e.g. gpt-4, claude-3-opus, gemini-pro). The gateway routes to the appropriate provider. |
messages | array | Yes | Array of message objects (minimum 1). See Message Format. |
temperature | number | No | Sampling temperature between 0 and 2. Higher values produce more random output. |
top_p | number | No | Nucleus sampling parameter between 0 and 1. |
max_tokens | integer | No | Maximum number of tokens to generate. |
max_completion_tokens | integer | No | Alternative to max_tokens. Maximum completion tokens to generate. |
stream | boolean | No | If true, responses are streamed as server-sent events. Defaults to false. |
stop | string | string[] | No | Sequences where the model stops generating. |
n | integer | No | Number of completions to generate (1-10). |
tools | array | No | List of tool definitions the model may call. Each entry is either a function tool (type: "function") or a gateway built-in tool (type: "built_in"). See Built-in Tools. |
tool_choice | string | object | No | Controls tool calling: "auto", "none", "required", or a specific function. |
response_format | object | No | Output format: {"type": "text"}, {"type": "json_object"}, or {"type": "json_schema", "json_schema": {...}}. |
reasoning_effort | string | No | Reasoning budget for reasoning-capable models: "minimal", "low", "medium", or "high". Passed through to the provider’s native reasoning/thinking parameter. |
reasoning | object | No | Fine-grained reasoning controls: { "effort": "minimal|low|medium|high", "budget_tokens": <int>, "exclude_from_output": <bool> }. The adapter maps these to the provider’s native thinking-budget mechanism. |
cache_control | object | No | Prompt-caching directives: { "cache_prefix": <bool>, "ttl_seconds": <int> }. The adapter selects the provider’s native caching mechanism. |
seed | integer | No | Seed for deterministic sampling (best-effort). |
user | string | No | End-user identifier for abuse tracking. |
frequency_penalty | number | No | Penalty for token frequency, between -2.0 and 2.0. Positive values discourage repetition. |
presence_penalty | number | No | Penalty for token presence, between -2.0 and 2.0. Positive values encourage new topics. |
logprobs | boolean | No | If true, return per-token log probabilities in the response. |
top_logprobs | integer | No | Number of top alternative tokens to include in logprobs, between 0 and 20. Requires logprobs: true. |
logit_bias | object | No | Map of token-id (string) to bias value between -100 and 100. Adjusts the likelihood of specific tokens appearing. |
Message Format
Each message object has the following structure:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | One of system, user, assistant, or tool. |
content | string | array | null | Yes | The message content. Can be a string, an array of content parts (text or image_url), or null for assistant messages with tool calls. |
name | string | No | Name of the message author. |
tool_calls | array | No | Tool calls made by the assistant. Each has id, type: "function", and function: {name, arguments}. |
tool_call_id | string | No | For tool role messages, the ID of the tool call being responded to. |
Multimodal Content
The content field can be an array of content parts for vision requests:
[ { "type": "text", "text": "What is in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/image.png", "detail": "auto" } }]The detail field accepts "auto", "low", or "high".
Response
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1709000000, "model": "gpt-4", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 9, "total_tokens": 21 }}Streaming
Set "stream": true to receive responses as server-sent events (SSE). The response uses Content-Type: text/event-stream.
Each event is a JSON object prefixed with data: :
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: [DONE]The stream terminates with data: [DONE].
Headers
The response includes:
| Header | Description |
|---|---|
X-Request-ID | Unique identifier for the request, useful for debugging and audit trails. |
X-Provider | The provider the request was routed to (e.g. openai, anthropic). |
X-Model | The resolved model that served the request. |
Example
curl https://your-gateway.example.com/v1/chat/completions \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in one sentence."} ], "temperature": 0.7, "max_tokens": 100 }'Streaming Example
curl https://your-gateway.example.com/v1/chat/completions \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Write a haiku about APIs."}], "stream": true }'Tool Calling Example
curl https://your-gateway.example.com/v1/chat/completions \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "What is the weather in London?"}], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } } ], "tool_choice": "auto" }'Built-in Tools
In addition to function tools, the gateway accepts built-in tools that map to provider-native capabilities. Each built-in tool uses type: "built_in" with a built_in name and an optional config object:
built_in | Description |
|---|---|
file_search | Gateway-side retrieval over your vector stores. The gateway retrieves matching chunks and injects them as system context, then strips the tool before the request reaches the provider. Pass config: { "vectorStoreIds": [...], "topK": <int> }. |
web_search | Maps to the provider’s native web-search tool (where supported, e.g. Anthropic, Gemini). |
code_execution | Maps to the provider’s native code-execution tool. |
computer_use | Maps to the provider’s native computer-use tool. |
search_grounding | Maps to Gemini’s search-grounding capability. |
{ "model": "gpt-4o", "messages": [{ "role": "user", "content": "Summarize our onboarding policy." }], "tools": [ { "type": "built_in", "built_in": "file_search", "config": { "vectorStoreIds": ["vs_abc123"], "topK": 5 } } ]}The file_search built-in is handled entirely by the gateway and works with any chat-capable provider. The remaining built-ins are forwarded to providers that support the corresponding native tool.
Gateway Features
The chat completions endpoint passes through the full gateway middleware pipeline:
- Request normalization — requests are translated to a unified internal format, enabling routing to any provider.
- RAG injection — if RAG is configured for the tenant, relevant documents are injected into the conversation context.
- Prompt guards — configurable content filters for injection detection, PII, and toxicity.
- Budget checks — requests are blocked if the tenant or API key budget has been exceeded.
- Semantic cache — identical or semantically similar requests may be served from cache.
- Usage tracking — token usage is recorded for billing and analytics.
- Audit logging — requests are logged for compliance.