Responses
The Responses API mirrors OpenAI’s /v1/responses surface and threads
continuations through gateway-issued response IDs. The gateway translates
previous_response_id from its own ID space to the underlying provider’s
ID before dispatching, then persists the full response output blocks so
later continuations work even after the provider’s own retention window
rotates.
Required capability: responses
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /v1/responses | Create a new response or continuation. |
GET | /v1/responses | List responses. |
GET | /v1/responses/:responseId | Fetch a stored response. |
POST | /v1/responses/:responseId/cancel | Cancel an in-progress response. |
DELETE | /v1/responses/:responseId | Remove a stored response. |
Create
curl https://your-gateway.example.com/v1/responses \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "input": "What did you tell me yesterday about quantum tunneling?", "previous_response_id": "resp-abc-..." }'Request fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model id to call. |
input | string | array | Yes | Single string or an array of input items. |
provider | string | No | Override the routing chain. Default: chat routing config’s first provider for the model, falling back to openai. |
previous_response_id | string | No | Gateway response ID to continue from. |
instructions | string | No | System-style instructions. |
tools | array | No | Function tools or built-in tools (file_search, web_search, code_execution, computer_use, search_grounding). |
tool_choice | string | object | No | auto / none / required / {type, function}. |
temperature | number | No | 0–2. |
top_p | number | No | 0–1. |
max_output_tokens | integer | No | |
store | boolean | No | Persist for later retrieval. Default: true. |
stream | boolean | No | SSE streaming. Default: false. |
reasoning | object | No | {effort: "minimal"|"low"|"medium"|"high"} or {budget_tokens} for thinking models. |
metadata | object | No | Free-form tenant metadata. |
Response
{ "id": "resp-7c8b9d3e-...", "object": "response", "provider": "openai", "provider_response_id": "resp_OAI_xyz", "model": "gpt-4o", "previous_response_id": "resp-abc-...", "status": "completed", "output": [ /* provider-native output blocks */ ], "usage": { "input_tokens": 124, "output_tokens": 312, "total_tokens": 436 }, "created_at": 1748284800, "completed_at": 1748284805, "metadata": null}Cross-provider continuation
Continuation across different providers is not supported and returns
FILE_PROVIDER_MISMATCH — providers track context differently and there’s
no portable protocol for thread state. A continuation request must hit
the same provider that produced the previous response.
Ephemeral responses
Set "store": false to skip persistence — the gateway returns the
provider’s output but doesn’t write a ResponseSession record. Useful
for high-throughput stateless workflows where you don’t need history.