Documentation Index
Fetch the complete documentation index at: https://docs.llmgrid.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Router Settings screen controls how requests are routed to model deployments and how failures are handled. It allows platform administrators to configure load balancing strategies, fallback models, and retry behavior to improve reliability and performance. Changes made here apply tenant-wide and affect how requests are resolved at runtime.Tabs
The Router Settings screen is organized into three tabs:- Loadbalancing
- Fallbacks
- General
Loadbalancing
The Loadbalancing tab defines how requests are distributed across multiple deployments.Routing Strategy
Select a strategy used to balance traffic. Available options include:- simple-shuffle
Randomly selects a deployment from the available list. Simple and fast. - least-busy
Routes requests to the deployment with the lowest number of ongoing requests. - usage-based-routing (deprecated)
Routes based on lowest token usage. - usage-based-routing-v2
Improved version of usage-based routing with better tracking. - latency-based-routing
Routes to the deployment with the lowest observed latency over a sliding window. - cost-based-routing
Routes to the deployment with the lowest cost per token.
Enable Tag Filtering
When enabled, routing decisions can consider tags associated with requests.- Useful for environment-aware or workload-aware routing
- Common use cases include separating production, staging, or experimental traffic
Reliability & Retries
This section defines how failures are handled during request execution.Allowed Fails
Number of times a deployment can fail before it is placed into cooldown.Cooldown Time
Length of time a failed deployment is excluded before it becomes eligible again.Number of Retries
Maximum number of retry attempts for a failed request.Timeout
Maximum time allowed for a request before it is considered failed.Retry After
Minimum waiting period before retrying a failed request.Retry Policy
Optional custom retry behavior for different error types.Model Group Alias
Defines aliases for model groups that can be referenced in routing and configuration. This allows:- Abstracting underlying model sets
- Safer model migrations
- Stable references across applications
Fallbacks
The Fallbacks tab allows you to define fallback models for improved availability.Add Fallbacks
Select Add Fallbacks to configure fallback behavior.Primary Model
The model that users initially request.Fallback Models
One or more models used when the primary model is unavailable or fails.Order matters: fallback models are tried sequentially (first, second, third, and so on).
General Settings
The General tab under Router Settings defines global constraints and limits that apply to all routed requests. These settings help control concurrency, payload size, and proxy-level protection across the tenant.max_parallel_requests
Maximum number of concurrent requests allowed per API key. Purpose- Prevents a single key or client from overwhelming the system
- Enforces fair usage at the key level
global_max_parallel_requests
Maximum number of concurrent requests allowed across the entire proxy instance. Purpose- Acts as a hard upper bound for total concurrency
- Protects the proxy from overload during traffic spikes
max_request_size_mb
Maximum allowed size of a request payload. Behavior- Requests exceeding this size are rejected immediately
- Applies to the full request body
max_response_size_mb
Maximum allowed size of a response payload. Behavior- Responses larger than this limit are rejected
- Prevents excessive output from impacting stability
pass_through_endpoints
Defines provider-specific pass-through endpoints that bypass standard routing. Use cases- Access provider-native APIs
- Support custom or non-standard endpoints
- Enable advanced integrations not covered by default routes
Managing Settings
For each setting:- Enter a value in the Value column
- Select Update to apply the change
- Use the delete action to remove a previously set value
Status Indicator
- Not Set – The setting is currently not enforced
- Set values become active immediately after saving
Best Practices
- Set conservative global limits to protect the proxy
- Use per-key parallel limits for multi-tenant environments
- Restrict request and response sizes for production workloads
- Review and document any pass-through endpoints carefully
- Test changes in non-production environments before rollout
Related Sections
- Loadbalancing – Control request distribution
- Fallbacks – Improve reliability with backup models
- Logs – Observe request rejection and limit enforcement
- Virtual Keys – Apply limits at the key level
Save or Reset Changes
- Save Changes
Applies all configuration updates. - Reset
Reverts unsaved changes.
Common Use Cases
- Improve reliability with fallback models
- Reduce latency using intelligent routing
- Balance load across multiple deployments
- Protect against flapping or unstable deployments
- Standardize routing behavior across teams
Best Practices
- Start with simple-shuffle unless you need optimization
- Add fallback models for critical workloads
- Keep retry limits conservative to avoid cascading failures
- Monitor routing behavior using logs and usage dashboards
- Test routing changes in non-production environments first
Related Sections
- Models – Manage available deployments
- Usage & Logs – Observe routing outcomes
- Virtual Keys – Apply access and constraints
- Guardrails – Enforce safe and compliant responses “

