Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.llmgrid.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Caching section provides tools to configure and operate response caching backed by Redis. Caching helps reduce repeated model calls, improve response latency, and optimize resource utilization for repeat or deterministic requests. Caching configuration and observability are managed entirely through the UI and apply at the platform level.

Caching Tabs

The Caching screen is organized into three tabs:
  • Cache Analytics
  • Cache Health
  • Cache Settings
Each tab focuses on a different aspect of cache operation.

Cache Analytics

The Cache Analytics tab provides visibility into cache efficiency and usage.

Filters

Use the filters at the top to analyze cache behavior:
  • Virtual Keys – Analyze cache activity per key
  • Models – Filter by model
  • Time Range – Select a custom time window

Key Metrics

Cache Hit Ratio

Percentage of requests served from cache instead of invoking the model.

Cache Hits

Total number of requests fulfilled from cache.

Cached Tokens

Number of tokens returned from cached responses.

Charts

Cache Hits vs API Requests

Compares total LLM requests against cached responses to show cache effectiveness.

Cached Completion Tokens vs Generated Completion Tokens

Shows how many completion tokens were served from cache versus generated by models.

Use Cases

  • Identify opportunities to improve cache usage
  • Validate that caching is functioning as expected
  • Measure performance gains from caching

Cache Health

The Cache Health tab validates connectivity and readiness of the cache backend.

Run Health Check

Select Run Health Check to test the cache connection.

Health Check Results

Results are displayed in two formats:
  • Summary – High-level status (success or failure)
  • Raw Response – Detailed diagnostic output
Use this view during onboarding, troubleshooting, or after configuration changes.

Cache Settings

The Cache Settings tab is where Redis caching is configured.

Redis Type

Select the Redis deployment type based on scalability, availability, and workload requirements.
  • Node (Single Instance)
    Standard Redis single-node deployment. Suitable for development, testing, and low-to-moderate workloads.
  • Cluster
    Redis cluster deployment with data sharding across multiple nodes. Designed for higher throughput and horizontal scaling.
  • Sentinel
    Redis Sentinel–managed deployment providing automatic failover and high availability.
  • Semantic
    Specialized cache mode intended for semantic-aware caching scenarios, where cache keys may be derived from embeddings or similarity matching.

Connection Settings

Configure how LLMGrid connects to the Redis instance.

Host

Redis server hostname or IP address.

Port

Redis server port (default: 6379).

Password

Redis authentication password, if required.

Username

Redis username, if required (for environments using Redis ACLs).

Advanced Settings

Expand Advanced Settings to configure additional Redis options required by your environment.

Actions

  • Test Connection
    Validates Redis connectivity using the current configuration.
  • Save Changes
    Saves and applies the cache configuration.
Changes take effect immediately after saving.

Operational Notes

  • Redis connectivity must be healthy for caching to function
  • Cache Analytics reflects only the selected time range
  • Cache behavior depends on request determinism, routing, and guardrails
  • Health checks do not generate or proxy model traffic

Best Practices

  • Verify Redis connectivity before enabling caching broadly
  • Monitor Cache Hit Ratio regularly
  • Start with Node (Single Instance) before scaling
  • Use Cluster or Sentinel for production reliability
  • Evaluate Semantic caching only for similarity-based workloads
  • Use cache analytics to validate latency and efficiency gains

  • Router Settings – Understand how routing interacts with caching
  • Usage – Observe overall request patterns
  • Virtual Keys – Analyze caching behavior per key
  • Models – Identify models benefiting from caching
  • Logs – Debug cached versus non-cached requests “