Chat Completions

Create a chat completion using any configured provider through a single OpenAI-compatible endpoint.

POST /v1/chat/completions

Required capability: chat

Request Body

Parameter	Type	Required	Description
`model`	`string`	Yes	Model identifier (e.g. `gpt-4`, `claude-3-opus`, `gemini-pro`). The gateway routes to the appropriate provider.
`messages`	`array`	Yes	Array of message objects (minimum 1). See Message Format.
`temperature`	`number`	No	Sampling temperature between 0 and 2. Higher values produce more random output.
`top_p`	`number`	No	Nucleus sampling parameter between 0 and 1.
`max_tokens`	`integer`	No	Maximum number of tokens to generate.
`max_completion_tokens`	`integer`	No	Alternative to `max_tokens`. Maximum completion tokens to generate.
`stream`	`boolean`	No	If `true`, responses are streamed as server-sent events. Defaults to `false`.
`stop`	`string \| string[]`	No	Sequences where the model stops generating.
`n`	`integer`	No	Number of completions to generate (1-10).
`tools`	`array`	No	List of tool definitions the model may call. Each entry is either a function tool (`type: "function"`) or a gateway built-in tool (`type: "built_in"`). See Built-in Tools.
`tool_choice`	`string \| object`	No	Controls tool calling: `"auto"`, `"none"`, `"required"`, or a specific function.
`response_format`	`object`	No	Output format: `{"type": "text"}`, `{"type": "json_object"}`, or `{"type": "json_schema", "json_schema": {...}}`.
`reasoning_effort`	`string`	No	Reasoning budget for reasoning-capable models: `"minimal"`, `"low"`, `"medium"`, or `"high"`. Passed through to the provider’s native reasoning/thinking parameter.
`reasoning`	`object`	No	Fine-grained reasoning controls: `{ "effort": "minimal\|low\|medium\|high", "budget_tokens": <int>, "exclude_from_output": <bool> }`. The adapter maps these to the provider’s native thinking-budget mechanism.
`cache_control`	`object`	No	Prompt-caching directives: `{ "cache_prefix": <bool>, "ttl_seconds": <int> }`. The adapter selects the provider’s native caching mechanism.
`seed`	`integer`	No	Seed for deterministic sampling (best-effort).
`user`	`string`	No	End-user identifier for abuse tracking.
`frequency_penalty`	`number`	No	Penalty for token frequency, between `-2.0` and `2.0`. Positive values discourage repetition.
`presence_penalty`	`number`	No	Penalty for token presence, between `-2.0` and `2.0`. Positive values encourage new topics.
`logprobs`	`boolean`	No	If `true`, return per-token log probabilities in the response.
`top_logprobs`	`integer`	No	Number of top alternative tokens to include in `logprobs`, between `0` and `20`. Requires `logprobs: true`.
`logit_bias`	`object`	No	Map of token-id (string) to bias value between `-100` and `100`. Adjusts the likelihood of specific tokens appearing.

Message Format

Each message object has the following structure:

Field	Type	Required	Description
`role`	`string`	Yes	One of `system`, `user`, `assistant`, or `tool`.
`content`	`string \| array \| null`	Yes	The message content. Can be a string, an array of content parts (text or image_url), or `null` for assistant messages with tool calls.
`name`	`string`	No	Name of the message author.
`tool_calls`	`array`	No	Tool calls made by the assistant. Each has `id`, `type: "function"`, and `function: {name, arguments}`.
`tool_call_id`	`string`	No	For `tool` role messages, the ID of the tool call being responded to.

Multimodal Content

The content field can be an array of content parts for vision requests:

[
  { "type": "text", "text": "What is in this image?" },
  { "type": "image_url", "image_url": { "url": "https://example.com/image.png", "detail": "auto" } }
]

The detail field accepts "auto", "low", or "high".

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Streaming

Set "stream": true to receive responses as server-sent events (SSE). The response uses Content-Type: text/event-stream.

Each event is a JSON object prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

The stream terminates with data: [DONE].

Headers

The response includes:

Header	Description
`X-Request-ID`	Unique identifier for the request, useful for debugging and audit trails.
`X-Provider`	The provider the request was routed to (e.g. `openai`, `anthropic`).
`X-Model`	The resolved model that served the request.

Example

curl https://your-gateway.example.com/v1/chat/completions \
  -H "Authorization: Bearer aigw_sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one sentence."}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

Streaming Example

curl https://your-gateway.example.com/v1/chat/completions \
  -H "Authorization: Bearer aigw_sk_your_api_key" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Write a haiku about APIs."}],
    "stream": true
  }'

Tool Calling Example

curl https://your-gateway.example.com/v1/chat/completions \
  -H "Authorization: Bearer aigw_sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is the weather in London?"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Built-in Tools

In addition to function tools, the gateway accepts built-in tools that map to provider-native capabilities. Each built-in tool uses type: "built_in" with a built_in name and an optional config object:

`built_in`	Description
`file_search`	Gateway-side retrieval over your vector stores. The gateway retrieves matching chunks and injects them as system context, then strips the tool before the request reaches the provider. Pass `config: { "vectorStoreIds": [...], "topK": <int> }`.
`web_search`	Maps to the provider’s native web-search tool (where supported, e.g. Anthropic, Gemini).
`code_execution`	Maps to the provider’s native code-execution tool.
`computer_use`	Maps to the provider’s native computer-use tool.
`search_grounding`	Maps to Gemini’s search-grounding capability.

{
  "model": "gpt-4o",
  "messages": [{ "role": "user", "content": "Summarize our onboarding policy." }],
  "tools": [
    {
      "type": "built_in",
      "built_in": "file_search",
      "config": { "vectorStoreIds": ["vs_abc123"], "topK": 5 }
    }
  ]
}

The file_search built-in is handled entirely by the gateway and works with any chat-capable provider. The remaining built-ins are forwarded to providers that support the corresponding native tool.

Gateway Features

The chat completions endpoint passes through the full gateway middleware pipeline:

Request normalization — requests are translated to a unified internal format, enabling routing to any provider.
RAG injection — if RAG is configured for the tenant, relevant documents are injected into the conversation context.
Prompt guards — configurable content filters for injection detection, PII, and toxicity.
Budget checks — requests are blocked if the tenant or API key budget has been exceeded.
Semantic cache — identical or semantically similar requests may be served from cache.
Usage tracking — token usage is recorded for billing and analytics.
Audit logging — requests are logged for compliance.