Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.llmgrid.ai/llms.txt

Use this file to discover all available pages before exploring further.

General

What is LLMGrid?

LLMGrid is an enterprise AI gateway and orchestration platform that provides centralized access, governance, routing, and observability for large language models, tools, and agents.

Is LLMGrid tied to a single model or provider?

No. LLMGrid is provider‑agnostic and designed to support multiple model providers, search tools, vector stores, and integrations behind a single, consistent API surface.

Does LLMGrid require changes to existing OpenAI‑based applications?

Minimal changes are required. Applications using OpenAI‑compatible SDKs typically only need to update the base_url to point to the LLMGrid proxy and use an LLMGrid API key.

Access & Authentication

What is a Virtual Key?

A Virtual Key is an API key managed by LLMGrid that controls authentication, access scope, budgets, rate limits, routing behavior, and observability for requests.

Can Virtual Keys be rotated or revoked?

Yes. Virtual Keys can be rotated or revoked at any time without requiring application redeployment.

How is access controlled?

Access is controlled using a combination of:
  • Virtual Keys
  • Teams and organizations
  • Budgets and limits
  • Guardrails
  • Routing policies

Models & Routing

How does model routing work?

Requests are routed based on configured routing strategies, access rules, and fallback policies. Routing can consider availability, performance, and governance constraints.

Can I use model aliases?

Yes. Model aliases allow applications to reference stable identifiers while underlying models or providers change.

What happens if a model is unavailable?

If fallback models are configured, LLMGrid automatically routes requests to the next available option. Otherwise, an error is returned.

Guardrails & Safety

What are Guardrails?

Guardrails are policy enforcement mechanisms that inspect and control inputs, outputs, and tool execution to ensure safety, compliance, and governance.

When do Guardrails run?

Guardrails can run:
  • Before a model call
  • During execution
  • After a response is generated
  • Before tool or MCP execution
  • In logging‑only (audit) mode

Can Guardrails be scoped?

Yes. Guardrails can be enforced tenant‑wide or scoped to specific keys, teams, routes, or test scenarios.

Usage, Cost & Budgets

How is usage tracked?

Usage is tracked at the request level, including tokens, execution time, routing outcomes, and metadata like keys, tags, and agents.

How do Budgets work?

Budgets define usage limits and rate limits that are enforced automatically. When a budget is exceeded, requests may be throttled or rejected.

Can costs be adjusted or discounted?

Yes. Cost Tracking supports applying percentage‑based discounts to provider costs, which are reflected in usage metrics and headers.

Caching & Performance

What does caching do?

Caching stores responses for repeat or deterministic requests, reducing latency and avoiding repeated model calls.

What cache backends are supported?

LLMGrid supports Redis‑based caching with multiple deployment modes, including single‑node, cluster, sentinel, and semantic‑aware caching.

Does caching affect model behavior?

Caching short‑circuits model execution for cache hits but does not alter model output semantics.

Search Tools & Vector Stores

What are Search Tools?

Search Tools allow models and agents to retrieve external or live information to ground responses and enable retrieval‑augmented generation (RAG).

What are Vector Stores used for?

Vector Stores store and retrieve embeddings for semantic search and RAG workflows. They are referenced by ID and managed centrally.

Can vector stores be tested?

Yes. Vector Stores include a test mode to validate connectivity and availability without impacting production traffic.

Observability & Logging

What types of logs are available?

LLMGrid provides:
  • Request logs
  • Execution and model logs
  • Guardrail enforcement logs
  • Audit logs for administrative actions

Can usage and logs be filtered?

Yes. Logs and metrics can be filtered by time, model, key, team, organization, tag, agent, and status.

Can observability data be exported?

Yes. Observability data can be accessed programmatically and integrated with external analytics or monitoring systems.

Security & Compliance

How is data protected?

LLMGrid enforces secure transport, masked credentials, scoped access, and policy‑based controls through Guardrails and access management.

Is LLMGrid suitable for regulated environments?

LLMGrid supports common enterprise compliance requirements through configuration, observability, and governance controls rather than hard‑coded logic.

Who is responsible for compliance?

LLMGrid provides security and governance tooling, while customers remain responsible for application‑level data handling and regulatory obligations.

Troubleshooting

My requests are failing—where should I look first?

Start with:
  • Request Logs
  • Guardrail enforcement events
  • Budget or rate‑limit violations
  • Model availability and routing status

Cache hit ratio is low—why?

Common reasons include:
  • Non‑deterministic prompts
  • Missing or inconsistent routing keys
  • Semantic caching not enabled where appropriate

A Guardrail is blocking traffic unexpectedly—what should I do?

Review Guardrail logs in logging‑only mode first, adjust scope or thresholds, and validate changes using the Test Playground.

Getting Help

Where can I find more documentation?

Refer to:
  • API Reference
  • Guardrails
  • Routing Settings
  • Observability
  • Security & Compliance

How do I request a feature?

Use the provided support or issue‑tracking links in the UI to submit feature requests or feedback.
If you need help beyond these FAQs, consult the relevant feature documentation or contact your platform administrator.