FAQs - LLMGrid.ai

General

What is LLMGrid?

LLMGrid is an enterprise AI gateway and orchestration platform that provides centralized access, governance, routing, and observability for large language models, tools, and agents.

Is LLMGrid tied to a single model or provider?

No. LLMGrid is provider‑agnostic and designed to support multiple model providers, search tools, vector stores, and integrations behind a single, consistent API surface.

Does LLMGrid require changes to existing OpenAI‑based applications?

Minimal changes are required. Applications using OpenAI‑compatible SDKs typically only need to update the base_url to point to the LLMGrid proxy and use an LLMGrid API key.

Access & Authentication

What is a Virtual Key?

A Virtual Key is an API key managed by LLMGrid that controls authentication, access scope, budgets, rate limits, routing behavior, and observability for requests.

Can Virtual Keys be rotated or revoked?

Yes. Virtual Keys can be rotated or revoked at any time without requiring application redeployment.

How is access controlled?

Access is controlled using a combination of:

Virtual Keys
Teams and organizations
Budgets and limits
Guardrails
Routing policies

Models & Routing

How does model routing work?

Requests are routed based on configured routing strategies, access rules, and fallback policies. Routing can consider availability, performance, and governance constraints.

Can I use model aliases?

Yes. Model aliases allow applications to reference stable identifiers while underlying models or providers change.

What happens if a model is unavailable?

If fallback models are configured, LLMGrid automatically routes requests to the next available option. Otherwise, an error is returned.

Guardrails & Safety

What are Guardrails?

Guardrails are policy enforcement mechanisms that inspect and control inputs, outputs, and tool execution to ensure safety, compliance, and governance.

When do Guardrails run?

Guardrails can run:

Before a model call
During execution
After a response is generated
Before tool or MCP execution
In logging‑only (audit) mode

Can Guardrails be scoped?

Yes. Guardrails can be enforced tenant‑wide or scoped to specific keys, teams, routes, or test scenarios.

Usage, Cost & Budgets

How is usage tracked?

Usage is tracked at the request level, including tokens, execution time, routing outcomes, and metadata like keys, tags, and agents.

How do Budgets work?

Budgets define usage limits and rate limits that are enforced automatically. When a budget is exceeded, requests may be throttled or rejected.

Can costs be adjusted or discounted?

Yes. Cost Tracking supports applying percentage‑based discounts to provider costs, which are reflected in usage metrics and headers.

Caching & Performance

What does caching do?

Caching stores responses for repeat or deterministic requests, reducing latency and avoiding repeated model calls.

What cache backends are supported?

LLMGrid supports Redis‑based caching with multiple deployment modes, including single‑node, cluster, sentinel, and semantic‑aware caching.

Does caching affect model behavior?

Caching short‑circuits model execution for cache hits but does not alter model output semantics.

Search Tools & Vector Stores

What are Search Tools?

Search Tools allow models and agents to retrieve external or live information to ground responses and enable retrieval‑augmented generation (RAG).

What are Vector Stores used for?

Vector Stores store and retrieve embeddings for semantic search and RAG workflows. They are referenced by ID and managed centrally.

Can vector stores be tested?

Yes. Vector Stores include a test mode to validate connectivity and availability without impacting production traffic.

Observability & Logging

What types of logs are available?

LLMGrid provides:

Request logs
Execution and model logs
Guardrail enforcement logs
Audit logs for administrative actions

Can usage and logs be filtered?

Yes. Logs and metrics can be filtered by time, model, key, team, organization, tag, agent, and status.

Can observability data be exported?

Yes. Observability data can be accessed programmatically and integrated with external analytics or monitoring systems.

Security & Compliance

How is data protected?

LLMGrid enforces secure transport, masked credentials, scoped access, and policy‑based controls through Guardrails and access management.

Is LLMGrid suitable for regulated environments?

LLMGrid supports common enterprise compliance requirements through configuration, observability, and governance controls rather than hard‑coded logic.

Who is responsible for compliance?

LLMGrid provides security and governance tooling, while customers remain responsible for application‑level data handling and regulatory obligations.

Troubleshooting

My requests are failing—where should I look first?

Start with:

Request Logs
Guardrail enforcement events
Budget or rate‑limit violations
Model availability and routing status

Cache hit ratio is low—why?

Common reasons include:

Non‑deterministic prompts
Missing or inconsistent routing keys
Semantic caching not enabled where appropriate

A Guardrail is blocking traffic unexpectedly—what should I do?

Review Guardrail logs in logging‑only mode first, adjust scope or thresholds, and validate changes using the Test Playground.

Getting Help

Where can I find more documentation?

Refer to:

API Reference
Guardrails
Routing Settings
Observability
Security & Compliance

How do I request a feature?

Use the provided support or issue‑tracking links in the UI to submit feature requests or feedback.

If you need help beyond these FAQs, consult the relevant feature documentation or contact your platform administrator.

Documentation Index

​General

​What is LLMGrid?

​Is LLMGrid tied to a single model or provider?

​Does LLMGrid require changes to existing OpenAI‑based applications?

​Access & Authentication

​What is a Virtual Key?

​Can Virtual Keys be rotated or revoked?

​How is access controlled?

​Models & Routing

​How does model routing work?

​Can I use model aliases?

​What happens if a model is unavailable?

​Guardrails & Safety

​What are Guardrails?

​When do Guardrails run?

​Can Guardrails be scoped?

​Usage, Cost & Budgets

​How is usage tracked?

​How do Budgets work?

​Can costs be adjusted or discounted?

​Caching & Performance

​What does caching do?

​What cache backends are supported?

​Does caching affect model behavior?

​Search Tools & Vector Stores

​What are Search Tools?

​What are Vector Stores used for?

​Can vector stores be tested?

​Observability & Logging

​What types of logs are available?

​Can usage and logs be filtered?

​Can observability data be exported?

​Security & Compliance

​How is data protected?

​Is LLMGrid suitable for regulated environments?

​Who is responsible for compliance?

​Troubleshooting

​My requests are failing—where should I look first?

​Cache hit ratio is low—why?

​A Guardrail is blocking traffic unexpectedly—what should I do?

​Getting Help

​Where can I find more documentation?

​How do I request a feature?