Caching

Overview

The Caching section provides tools to configure and operate response caching backed by Redis. Caching helps reduce repeated model calls, improve response latency, and optimize resource utilization for repeat or deterministic requests. Caching configuration and observability are managed entirely through the UI and apply at the platform level.

Caching Tabs

The Caching screen is organized into three tabs:

Cache Analytics
Cache Health
Cache Settings

Each tab focuses on a different aspect of cache operation.

Cache Analytics

The Cache Analytics tab provides visibility into cache efficiency and usage.

Filters

Use the filters at the top to analyze cache behavior:

Virtual Keys – Analyze cache activity per key
Models – Filter by model
Time Range – Select a custom time window

Key Metrics

Cache Hit Ratio

Percentage of requests served from cache instead of invoking the model.

Cache Hits

Total number of requests fulfilled from cache.

Cached Tokens

Number of tokens returned from cached responses.

Charts

Cache Hits vs API Requests

Compares total LLM requests against cached responses to show cache effectiveness.

Cached Completion Tokens vs Generated Completion Tokens

Shows how many completion tokens were served from cache versus generated by models.

Use Cases

Identify opportunities to improve cache usage
Validate that caching is functioning as expected
Measure performance gains from caching

Cache Health

The Cache Health tab validates connectivity and readiness of the cache backend.

Run Health Check

Select Run Health Check to test the cache connection.

Health Check Results

Results are displayed in two formats:

Summary – High-level status (success or failure)
Raw Response – Detailed diagnostic output

Use this view during onboarding, troubleshooting, or after configuration changes.

Cache Settings

The Cache Settings tab is where Redis caching is configured.

Redis Type

Select the Redis deployment type based on scalability, availability, and workload requirements.

Node (Single Instance)
Standard Redis single-node deployment. Suitable for development, testing, and low-to-moderate workloads.
Cluster
Redis cluster deployment with data sharding across multiple nodes. Designed for higher throughput and horizontal scaling.
Sentinel
Redis Sentinel–managed deployment providing automatic failover and high availability.
Semantic
Specialized cache mode intended for semantic-aware caching scenarios, where cache keys may be derived from embeddings or similarity matching.

Connection Settings

Configure how LLMGrid connects to the Redis instance.

Host

Redis server hostname or IP address.

Port

Redis server port (default: 6379).

Password

Redis authentication password, if required.

Username

Redis username, if required (for environments using Redis ACLs).

Advanced Settings

Expand Advanced Settings to configure additional Redis options required by your environment.

Actions

Test Connection
Validates Redis connectivity using the current configuration.
Save Changes
Saves and applies the cache configuration.

Changes take effect immediately after saving.

Operational Notes

Redis connectivity must be healthy for caching to function
Cache Analytics reflects only the selected time range
Cache behavior depends on request determinism, routing, and guardrails
Health checks do not generate or proxy model traffic

Best Practices

Verify Redis connectivity before enabling caching broadly
Monitor Cache Hit Ratio regularly
Start with Node (Single Instance) before scaling
Use Cluster or Sentinel for production reliability
Evaluate Semantic caching only for similarity-based workloads
Use cache analytics to validate latency and efficiency gains

Router Settings – Understand how routing interacts with caching
Usage – Observe overall request patterns
Virtual Keys – Analyze caching behavior per key
Models – Identify models benefiting from caching
Logs – Debug cached versus non-cached requests “

Getting Started

Administration

Developer Docs

Overview

Caching Tabs

Cache Analytics

Filters

Key Metrics

Cache Hit Ratio

Cache Hits

Cached Tokens

Charts

Cache Hits vs API Requests

Cached Completion Tokens vs Generated Completion Tokens

Use Cases

Cache Health

Run Health Check

Health Check Results

Cache Settings

Redis Type

Connection Settings

Host

Port

Password

Username

Advanced Settings

Actions

Operational Notes

Best Practices

Getting Started

Administration

Developer Docs

Documentation Index

​Overview

​Caching Tabs

​Cache Analytics

​Filters

​Key Metrics

​Cache Hit Ratio

​Cache Hits

​Cached Tokens

​Charts

​Cache Hits vs API Requests

​Cached Completion Tokens vs Generated Completion Tokens

​Use Cases

​Cache Health

​Run Health Check

​Health Check Results

​Cache Settings

​Redis Type

​Connection Settings

​Host

​Port

​Password

​Username

​Advanced Settings

​Actions

​Operational Notes

​Best Practices

​Related Sections

Overview

Caching Tabs

Cache Analytics

Filters

Key Metrics

Cache Hit Ratio

Cache Hits

Cached Tokens

Charts

Cache Hits vs API Requests

Cached Completion Tokens vs Generated Completion Tokens

Use Cases

Cache Health

Run Health Check

Health Check Results

Cache Settings

Redis Type

Connection Settings

Host

Port

Password

Username

Advanced Settings

Actions

Operational Notes

Best Practices

Related Sections