Documentation Index
Fetch the complete documentation index at: https://docs.llmgrid.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Caching section provides tools to configure and operate response caching backed by Redis. Caching helps reduce repeated model calls, improve response latency, and optimize resource utilization for repeat or deterministic requests. Caching configuration and observability are managed entirely through the UI and apply at the platform level.Caching Tabs
The Caching screen is organized into three tabs:- Cache Analytics
- Cache Health
- Cache Settings
Cache Analytics
The Cache Analytics tab provides visibility into cache efficiency and usage.Filters
Use the filters at the top to analyze cache behavior:- Virtual Keys – Analyze cache activity per key
- Models – Filter by model
- Time Range – Select a custom time window
Key Metrics
Cache Hit Ratio
Percentage of requests served from cache instead of invoking the model.Cache Hits
Total number of requests fulfilled from cache.Cached Tokens
Number of tokens returned from cached responses.Charts
Cache Hits vs API Requests
Compares total LLM requests against cached responses to show cache effectiveness.Cached Completion Tokens vs Generated Completion Tokens
Shows how many completion tokens were served from cache versus generated by models.Use Cases
- Identify opportunities to improve cache usage
- Validate that caching is functioning as expected
- Measure performance gains from caching
Cache Health
The Cache Health tab validates connectivity and readiness of the cache backend.Run Health Check
Select Run Health Check to test the cache connection.Health Check Results
Results are displayed in two formats:- Summary – High-level status (success or failure)
- Raw Response – Detailed diagnostic output
Cache Settings
The Cache Settings tab is where Redis caching is configured.Redis Type
Select the Redis deployment type based on scalability, availability, and workload requirements.- Node (Single Instance)
Standard Redis single-node deployment. Suitable for development, testing, and low-to-moderate workloads. - Cluster
Redis cluster deployment with data sharding across multiple nodes. Designed for higher throughput and horizontal scaling. - Sentinel
Redis Sentinel–managed deployment providing automatic failover and high availability. - Semantic
Specialized cache mode intended for semantic-aware caching scenarios, where cache keys may be derived from embeddings or similarity matching.
Connection Settings
Configure how LLMGrid connects to the Redis instance.Host
Redis server hostname or IP address.Port
Redis server port (default:6379).
Password
Redis authentication password, if required.Username
Redis username, if required (for environments using Redis ACLs).Advanced Settings
Expand Advanced Settings to configure additional Redis options required by your environment.Actions
- Test Connection
Validates Redis connectivity using the current configuration. - Save Changes
Saves and applies the cache configuration.
Operational Notes
- Redis connectivity must be healthy for caching to function
- Cache Analytics reflects only the selected time range
- Cache behavior depends on request determinism, routing, and guardrails
- Health checks do not generate or proxy model traffic
Best Practices
- Verify Redis connectivity before enabling caching broadly
- Monitor Cache Hit Ratio regularly
- Start with Node (Single Instance) before scaling
- Use Cluster or Sentinel for production reliability
- Evaluate Semantic caching only for similarity-based workloads
- Use cache analytics to validate latency and efficiency gains
Related Sections
- Router Settings – Understand how routing interacts with caching
- Usage – Observe overall request patterns
- Virtual Keys – Analyze caching behavior per key
- Models – Identify models benefiting from caching
- Logs – Debug cached versus non-cached requests “

