Skip to content

Messages (Anthropic Format)

Create a message using the Anthropic Messages API format. The gateway normalizes and routes the request to any configured provider.

POST /v1/messages

Required capability: chat

Request Body

ParameterTypeRequiredDescription
modelstringYesModel identifier (e.g. claude-3-opus-20240229, gpt-4).
messagesarrayYesArray of message objects (minimum 1). See Message Format.
max_tokensintegerYesMaximum number of tokens to generate. Required in the Anthropic format.
systemstring | arrayNoSystem prompt. Can be a string or an array of {"type": "text", "text": "..."} objects.
temperaturenumberNoSampling temperature between 0 and 1.
top_pnumberNoNucleus sampling between 0 and 1.
stop_sequencesstring[]NoCustom stop sequences.
streambooleanNoIf true, responses are streamed as server-sent events. Defaults to false.
toolsarrayNoTool definitions. Each has name, optional description, and input_schema.
tool_choiceobjectNoTool selection: {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "..."}.
metadataobjectNoOptional metadata. Supports user_id for end-user tracking.

Message Format

FieldTypeRequiredDescription
rolestringYesEither user or assistant.
contentstring | arrayYesA string or an array of content blocks.

Content blocks use a discriminated union on the type field:

TypeFieldsDescription
texttextPlain text content.
imagesource: {type: "base64", media_type, data}Base64-encoded image.
tool_useid, name, inputA tool call from the assistant.
tool_resulttool_use_id, contentThe result of a tool call.

Response

{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-3-opus-20240229",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 12,
"output_tokens": 9
}
}

Streaming

Set "stream": true to receive Anthropic-format server-sent events. The response uses Content-Type: text/event-stream.

Events are emitted in this order:

  1. message_start — contains the message metadata.
  2. content_block_start — signals the beginning of a content block.
  3. content_block_delta — incremental text deltas.
  4. content_block_stop — signals the end of the content block.
  5. message_delta — contains stop_reason and output usage.
  6. message_stop — signals the end of the message.
event: message_start
data: {"type":"message_start","message":{"id":"msg_abc123","type":"message","role":"assistant","content":[],"model":"claude-3-opus-20240229","usage":{"input_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":9}}
event: message_stop
data: {"type":"message_stop"}

Streaming Limitations (Known)

The current streaming implementation has several documented limitations — clients should not depend on the values listed here being meaningful:

  • usage.input_tokens on message_start is always 0. The real input token count is not known until the upstream provider returns final usage data, which arrives near the end of the stream. The final accurate usage is included in the message_delta event’s usage.output_tokens (and recorded server-side for billing), but the input_tokens value emitted at message_start is a hardcoded placeholder.
  • stop_reason is always end_turn in message_delta. The route emits stop_reason: 'end_turn' regardless of the actual finish reason reported by the upstream provider. If you need the real finish reason, use the non-streaming endpoint.
  • Tool-call streaming is not supported. The streaming path does not emit content_block_start of type tool_use and does not emit input_json_delta events. Tool calls are silently dropped from the streamed output. If your use case depends on streamed tool calls, use the non-streaming endpoint or the OpenAI-format /v1/chat/completions endpoint.
  • Validation failures use the OpenAI error envelope, not the Anthropic envelope. HTTP 400 responses for malformed /v1/messages requests are emitted by the global error handler as {error: {code, message, details}} (OpenAI shape), not as {type: "error", error: {type, message}} (Anthropic shape). Runtime errors (timeouts, upstream failures, internal errors) do use the Anthropic shape — the inconsistency is between validation and runtime error paths. This is a known issue.

Headers

The response includes:

HeaderDescription
X-Request-IDUnique identifier for the request, useful for debugging and audit trails.
X-ProviderThe provider the request was routed to.
X-ModelThe resolved model that served the request.

Example

Terminal window
curl https://your-gateway.example.com/v1/messages \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-opus-20240229",
"max_tokens": 1024,
"system": "You are a helpful assistant.",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'

Streaming Example

Terminal window
curl https://your-gateway.example.com/v1/messages \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "claude-3-opus-20240229",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Tell me a joke."}],
"stream": true
}'

Error Format

Errors from the Messages endpoint use the Anthropic error format:

{
"type": "error",
"error": {
"type": "server_error",
"message": "Detailed error message"
}
}

Gateway Features

This endpoint passes through the same gateway middleware pipeline as /v1/chat/completions, including request normalization, RAG injection, prompt guards, budget checks, semantic cache, usage tracking, and audit logging. The gateway automatically translates between the Anthropic message format and its internal unified format, enabling routing to any provider regardless of the request format used.