Messages (Anthropic Format)
Create a message using the Anthropic Messages API format. The gateway normalizes and routes the request to any configured provider.
POST /v1/messagesRequired capability: chat
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier (e.g. claude-3-opus-20240229, gpt-4). |
messages | array | Yes | Array of message objects (minimum 1). See Message Format. |
max_tokens | integer | Yes | Maximum number of tokens to generate. Required in the Anthropic format. |
system | string | array | No | System prompt. Can be a string or an array of {"type": "text", "text": "..."} objects. |
temperature | number | No | Sampling temperature between 0 and 1. |
top_p | number | No | Nucleus sampling between 0 and 1. |
stop_sequences | string[] | No | Custom stop sequences. |
stream | boolean | No | If true, responses are streamed as server-sent events. Defaults to false. |
tools | array | No | Tool definitions. Each has name, optional description, and input_schema. |
tool_choice | object | No | Tool selection: {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "..."}. |
metadata | object | No | Optional metadata. Supports user_id for end-user tracking. |
Message Format
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | Either user or assistant. |
content | string | array | Yes | A string or an array of content blocks. |
Content blocks use a discriminated union on the type field:
| Type | Fields | Description |
|---|---|---|
text | text | Plain text content. |
image | source: {type: "base64", media_type, data} | Base64-encoded image. |
tool_use | id, name, input | A tool call from the assistant. |
tool_result | tool_use_id, content | The result of a tool call. |
Response
{ "id": "msg_abc123", "type": "message", "role": "assistant", "content": [ { "type": "text", "text": "Hello! How can I help you today?" } ], "model": "claude-3-opus-20240229", "stop_reason": "end_turn", "usage": { "input_tokens": 12, "output_tokens": 9 }}Streaming
Set "stream": true to receive Anthropic-format server-sent events. The response uses Content-Type: text/event-stream.
Events are emitted in this order:
message_start— contains the message metadata.content_block_start— signals the beginning of a content block.content_block_delta— incremental text deltas.content_block_stop— signals the end of the content block.message_delta— containsstop_reasonand output usage.message_stop— signals the end of the message.
event: message_startdata: {"type":"message_start","message":{"id":"msg_abc123","type":"message","role":"assistant","content":[],"model":"claude-3-opus-20240229","usage":{"input_tokens":0}}}
event: content_block_startdata: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_deltadata: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stopdata: {"type":"content_block_stop","index":0}
event: message_deltadata: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":9}}
event: message_stopdata: {"type":"message_stop"}Streaming Limitations (Known)
The current streaming implementation has several documented limitations — clients should not depend on the values listed here being meaningful:
usage.input_tokensonmessage_startis always0. The real input token count is not known until the upstream provider returns final usage data, which arrives near the end of the stream. The final accurate usage is included in themessage_deltaevent’susage.output_tokens(and recorded server-side for billing), but theinput_tokensvalue emitted atmessage_startis a hardcoded placeholder.stop_reasonis alwaysend_turninmessage_delta. The route emitsstop_reason: 'end_turn'regardless of the actual finish reason reported by the upstream provider. If you need the real finish reason, use the non-streaming endpoint.- Tool-call streaming is not supported. The streaming path does not emit
content_block_startof typetool_useand does not emitinput_json_deltaevents. Tool calls are silently dropped from the streamed output. If your use case depends on streamed tool calls, use the non-streaming endpoint or the OpenAI-format/v1/chat/completionsendpoint. - Validation failures use the OpenAI error envelope, not the Anthropic envelope. HTTP 400 responses for malformed
/v1/messagesrequests are emitted by the global error handler as{error: {code, message, details}}(OpenAI shape), not as{type: "error", error: {type, message}}(Anthropic shape). Runtime errors (timeouts, upstream failures, internal errors) do use the Anthropic shape — the inconsistency is between validation and runtime error paths. This is a known issue.
Headers
The response includes:
| Header | Description |
|---|---|
X-Request-ID | Unique identifier for the request, useful for debugging and audit trails. |
X-Provider | The provider the request was routed to. |
X-Model | The resolved model that served the request. |
Example
curl https://your-gateway.example.com/v1/messages \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-3-opus-20240229", "max_tokens": 1024, "system": "You are a helpful assistant.", "messages": [ {"role": "user", "content": "What is the capital of France?"} ] }'Streaming Example
curl https://your-gateway.example.com/v1/messages \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "claude-3-opus-20240229", "max_tokens": 1024, "messages": [{"role": "user", "content": "Tell me a joke."}], "stream": true }'Error Format
Errors from the Messages endpoint use the Anthropic error format:
{ "type": "error", "error": { "type": "server_error", "message": "Detailed error message" }}Gateway Features
This endpoint passes through the same gateway middleware pipeline as /v1/chat/completions, including request normalization, RAG injection, prompt guards, budget checks, semantic cache, usage tracking, and audit logging. The gateway automatically translates between the Anthropic message format and its internal unified format, enabling routing to any provider regardless of the request format used.