Skip to content

Responses

The Responses API mirrors OpenAI’s /v1/responses surface and threads continuations through gateway-issued response IDs. The gateway translates previous_response_id from its own ID space to the underlying provider’s ID before dispatching, then persists the full response output blocks so later continuations work even after the provider’s own retention window rotates.

Required capability: responses

Endpoints

MethodPathDescription
POST/v1/responsesCreate a new response or continuation.
GET/v1/responsesList responses.
GET/v1/responses/:responseIdFetch a stored response.
POST/v1/responses/:responseId/cancelCancel an in-progress response.
DELETE/v1/responses/:responseIdRemove a stored response.

Create

Terminal window
curl https://your-gateway.example.com/v1/responses \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"input": "What did you tell me yesterday about quantum tunneling?",
"previous_response_id": "resp-abc-..."
}'

Request fields

FieldTypeRequiredDescription
modelstringYesModel id to call.
inputstring | arrayYesSingle string or an array of input items.
providerstringNoOverride the routing chain. Default: chat routing config’s first provider for the model, falling back to openai.
previous_response_idstringNoGateway response ID to continue from.
instructionsstringNoSystem-style instructions.
toolsarrayNoFunction tools or built-in tools (file_search, web_search, code_execution, computer_use, search_grounding).
tool_choicestring | objectNoauto / none / required / {type, function}.
temperaturenumberNo0–2.
top_pnumberNo0–1.
max_output_tokensintegerNo
storebooleanNoPersist for later retrieval. Default: true.
streambooleanNoSSE streaming. Default: false.
reasoningobjectNo{effort: "minimal"|"low"|"medium"|"high"} or {budget_tokens} for thinking models.
metadataobjectNoFree-form tenant metadata.

Response

{
"id": "resp-7c8b9d3e-...",
"object": "response",
"provider": "openai",
"provider_response_id": "resp_OAI_xyz",
"model": "gpt-4o",
"previous_response_id": "resp-abc-...",
"status": "completed",
"output": [ /* provider-native output blocks */ ],
"usage": { "input_tokens": 124, "output_tokens": 312, "total_tokens": 436 },
"created_at": 1748284800,
"completed_at": 1748284805,
"metadata": null
}

Cross-provider continuation

Continuation across different providers is not supported and returns FILE_PROVIDER_MISMATCH — providers track context differently and there’s no portable protocol for thread state. A continuation request must hit the same provider that produced the previous response.

Ephemeral responses

Set "store": false to skip persistence — the gateway returns the provider’s output but doesn’t write a ResponseSession record. Useful for high-throughput stateless workflows where you don’t need history.