Skip to content

Create Routing Rules

Routing rules determine which providers handle incoming requests and in what order. This tutorial walks you through creating a routing configuration using the priority strategy, then testing it with an API call.

Prerequisites

Step 1 — Navigate to the Routing Page

  1. In the sidebar, click Routing.
  2. The routing list shows all routing configurations for your tenant as a flat list of cards.
  3. Click Create Route.

Step 2 — Set Basic Configuration

  1. Name — Enter a descriptive name, for example Chat Primary Route.
  2. Slug (optional) — A lowercase-hyphenated identifier. If set, clients can target this config directly with the model string routing:<slug>.
  3. Capabilities — Select one or more request types this config handles (multi-select). Choose chat for chat completion requests.
  4. Strategy — Select priority to start. This strategy orders routes by their priority number. See Routing Strategies for all ten options.
  5. Enabled — Toggle on.

(A config can also be marked as the tenant default via the API field isDefault; there is no toggle for it in the create form.)

Step 3 — Add Provider Entries

Each route entry maps a provider to this routing configuration.

  1. Click Add Route.
  2. Provider — Select your provider from the dropdown (e.g., openai-prod).
  3. Model ID — Enter the default model for this route, for example gpt-4o.
  4. Priority — Enter 1 (lower number = higher priority). This provider will be tried first.
  5. Weight — Enter 1. Weight is used by the weighted strategy; for priority strategy it has no effect.
  6. Enabled — Toggle on.

To add a fallback provider:

  1. Click Add Route again.
  2. Select a second provider (e.g., anthropic-prod).
  3. Set Priority to 2. This provider is tried only if the first fails.
  4. Set Model ID to the equivalent model on this provider, for example claude-sonnet-4-20250514.

Step 4 — Configure the Fallback Chain (Optional)

The fallback chain provides a last-resort option after all route entries have been exhausted.

  1. Scroll to the Fallback Chain section.
  2. Add a provider entry with a model ID. This provider is appended after all strategy-ordered routes.
  3. You can also enable a Local Fallback (e.g., an Ollama instance) for complete offline resilience.

The routing service detects circular fallback chains and breaks them automatically.

Step 5 — Ensure Models Are Registered

Model filtering happens automatically — there is no separate UI step. When a client requests a specific model (via the model field), the routing service only considers routes whose provider has a matching, enabled ModelConfig. Providers without the requested model are skipped.

To ensure correct filtering:

  1. Verify each provider in your routes has the relevant models registered in the Models tab.
  2. Models must be marked as enabled to be eligible.

Step 6 — Test with an API Call

Send a test request to verify routing works:

Terminal window
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'

The response sets X-Provider and X-Model headers indicating which provider and model actually served the request:

X-Provider: openai
X-Model: gpt-4o

If the highest-priority route fails, the gateway transparently falls through to the next route (and then the fallback chain); the X-Provider/X-Model headers reflect whichever route ultimately succeeded.

How the Routing Service Works

  1. Load config — Finds the routing config matching the tenant, capability, and enabled state.
  2. Load providers — Fetches all ProviderConfig documents referenced in the routes.
  3. Filter — Removes providers that are disabled or not in the tenant’s allowed-providers list. Unhealthy providers are not removed — they are demoted to a last-resort pool.
  4. Apply strategy — Orders the remaining candidates using the selected strategy.
  5. Append fallbacks — Adds fallback chain entries after the strategy-ordered list.
  6. Cache — Results are cached in memory for 60 seconds with jittered TTL to prevent thundering herd.

Next Steps