Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.llmgrid.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

LLMGrid provides built‑in observability features to help administrators and operators understand how the platform is being used, detect issues, investigate incidents, and maintain reliability. Observability in LLMGrid is centered around logs, usage metrics, cost analytics, and health checks, all managed through the UI and enforced consistently across models, agents, tools, and workflows.

Core Observability Capabilities

LLMGrid observability is designed to answer four key questions:
  • What requests are being made?
  • How are models, tools, and agents behaving?
  • Where is usage and cost coming from?
  • Are systems healthy and compliant?
These questions are addressed through multiple observability surfaces.

Request & Execution Logs

Request Logs

The Request Logs page provides structured, request‑level visibility across the platform. Each logged request includes:
  • Timestamp
  • Request ID and session ID
  • Model or route used
  • Success or failure status
  • Execution duration
  • Token usage
  • Cost metadata
  • Associated virtual key, team, agent, or tag
Logs can be filtered by time range, model, key, team, tag, or status.

Audit Logs

Audit logs capture administrative and configuration actions, such as:
  • Creating or updating models
  • Modifying routing, guardrails, or budgets
  • Managing keys, users, or credentials
This supports:
  • Compliance audits
  • Change tracking
  • Incident investigation
Audit logs are read‑only and immutable.

Usage & Metrics

Usage Dashboard

The Usage section provides aggregated visibility into platform consumption. Metrics include:
  • Total requests
  • Successful vs failed requests
  • Token usage
  • Average request cost
  • Daily usage trends
Usage can be broken down by:
  • Global tenant usage
  • Organization
  • Team
  • User or agent
  • Tag
  • Virtual key
  • Model

Model & Key Activity

Dedicated views allow you to analyze:
  • Which models are most used
  • Which keys drive the highest activity
  • Agent‑initiated vs user‑initiated traffic
This helps with capacity planning and optimization.

Cost Observability

Cost Tracking

Cost tracking surfaces how usage translates into cost, including:
  • Base cost
  • Applied discounts
  • Final effective cost
Cost data is visible alongside usage metrics and can be filtered by the same dimensions (team, key, tag, model).

Budgets & Limits

Observability integrates with Budgets and Rate Limits to:
  • Detect approaching limits
  • Identify throttled requests
  • Prevent unexpected overuse
Budget enforcement events are visible in logs and metrics.

Guardrails Observability

When guardrails are enabled, observability includes:
  • Guardrail enforcement decisions
  • Blocked or modified requests
  • Logging‑only detections
  • Tool and MCP validation outcomes
This allows safety and compliance teams to observe policy impact without disrupting traffic.

Health & System Monitoring

Cache Health

The Caching Health view validates connectivity and readiness for caching backends. Health checks provide:
  • High‑level status
  • Detailed diagnostic output

Model & Tool Health

Health checks and execution logs help identify:
  • Unavailable models
  • Tool connection failures
  • Integration‑specific issues
These signals support reliability engineering and operational response.

Tags & Attribution

Tags play a key role in observability by enabling attribution and segmentation. You can observe usage, cost, and behavior by:
  • Client
  • Environment (prod, staging)
  • Integration source
  • Request context
Tags flow through logs, metrics, and analytics consistently.

Data Export & Analysis

Observability data can be exported or integrated with external systems via:
  • API access
  • Programmatic log ingestion
  • Analytics and reporting pipelines
This enables long‑term analysis and integration with SIEM or monitoring platforms.

Operational Best Practices

  • Set a regular cadence for reviewing usage and logs
  • Use tags to maintain attribution clarity
  • Monitor failure rates and latency trends
  • Validate changes using logs after configuration updates
  • Keep guardrail logs enabled during policy rollout
  • Use budgets and alerts as early‑warning signals

Observability & Governance Alignment

Observability features are tightly integrated with governance controls:
  • Virtual Keys define access boundaries
  • Guardrails enforce and log policy decisions
  • Budgets define operational limits
  • Routing affects execution paths
  • Logs record all outcomes
This ensures observability reflects real enforcement behavior, not just metrics.
  • Usage – High‑level metrics and trends
  • Logs – Request and audit logs
  • Cost Tracking – Cost attribution and discounts
  • Budgets – Usage thresholds and enforcement
  • Guardrails – Safety and compliance visibility
  • Router Settings – Routing behavior and outcomes