Observability - LLMGrid.ai

Overview

LLMGrid provides built‑in observability features to help administrators and operators understand how the platform is being used, detect issues, investigate incidents, and maintain reliability. Observability in LLMGrid is centered around logs, usage metrics, cost analytics, and health checks, all managed through the UI and enforced consistently across models, agents, tools, and workflows.

Core Observability Capabilities

LLMGrid observability is designed to answer four key questions:

What requests are being made?
How are models, tools, and agents behaving?
Where is usage and cost coming from?
Are systems healthy and compliant?

These questions are addressed through multiple observability surfaces.

Request & Execution Logs

Request Logs

The Request Logs page provides structured, request‑level visibility across the platform. Each logged request includes:

Timestamp
Request ID and session ID
Model or route used
Success or failure status
Execution duration
Token usage
Cost metadata
Associated virtual key, team, agent, or tag

Logs can be filtered by time range, model, key, team, tag, or status.

Audit Logs

Audit logs capture administrative and configuration actions, such as:

Creating or updating models
Modifying routing, guardrails, or budgets
Managing keys, users, or credentials

This supports:

Compliance audits
Change tracking
Incident investigation

Audit logs are read‑only and immutable.

Usage & Metrics

Usage Dashboard

The Usage section provides aggregated visibility into platform consumption. Metrics include:

Total requests
Successful vs failed requests
Token usage
Average request cost
Daily usage trends

Usage can be broken down by:

Global tenant usage
Organization
Team
User or agent
Tag
Virtual key
Model

Model & Key Activity

Dedicated views allow you to analyze:

Which models are most used
Which keys drive the highest activity
Agent‑initiated vs user‑initiated traffic

This helps with capacity planning and optimization.

Cost Observability

Cost Tracking

Cost tracking surfaces how usage translates into cost, including:

Base cost
Applied discounts
Final effective cost

Cost data is visible alongside usage metrics and can be filtered by the same dimensions (team, key, tag, model).

Budgets & Limits

Observability integrates with Budgets and Rate Limits to:

Detect approaching limits
Identify throttled requests
Prevent unexpected overuse

Budget enforcement events are visible in logs and metrics.

Guardrails Observability

When guardrails are enabled, observability includes:

Guardrail enforcement decisions
Blocked or modified requests
Logging‑only detections
Tool and MCP validation outcomes

This allows safety and compliance teams to observe policy impact without disrupting traffic.

Health & System Monitoring

Cache Health

The Caching Health view validates connectivity and readiness for caching backends. Health checks provide:

High‑level status
Detailed diagnostic output

Model & Tool Health

Health checks and execution logs help identify:

Unavailable models
Tool connection failures
Integration‑specific issues

These signals support reliability engineering and operational response.

Tags & Attribution

Tags play a key role in observability by enabling attribution and segmentation. You can observe usage, cost, and behavior by:

Client
Environment (prod, staging)
Integration source
Request context

Tags flow through logs, metrics, and analytics consistently.

Data Export & Analysis

Observability data can be exported or integrated with external systems via:

API access
Programmatic log ingestion
Analytics and reporting pipelines

This enables long‑term analysis and integration with SIEM or monitoring platforms.

Operational Best Practices

Set a regular cadence for reviewing usage and logs
Use tags to maintain attribution clarity
Monitor failure rates and latency trends
Validate changes using logs after configuration updates
Keep guardrail logs enabled during policy rollout
Use budgets and alerts as early‑warning signals

Observability & Governance Alignment

Observability features are tightly integrated with governance controls:

Virtual Keys define access boundaries
Guardrails enforce and log policy decisions
Budgets define operational limits
Routing affects execution paths
Logs record all outcomes

This ensures observability reflects real enforcement behavior, not just metrics.

Usage – High‑level metrics and trends
Logs – Request and audit logs
Cost Tracking – Cost attribution and discounts
Budgets – Usage thresholds and enforcement
Guardrails – Safety and compliance visibility
Router Settings – Routing behavior and outcomes

​Overview

​Core Observability Capabilities

​Request & Execution Logs

​Request Logs

​Audit Logs

​Usage & Metrics

​Usage Dashboard

​Model & Key Activity

​Cost Observability

​Cost Tracking

​Budgets & Limits

​Guardrails Observability

​Health & System Monitoring

​Cache Health

​Model & Tool Health

​Tags & Attribution

​Data Export & Analysis

​Operational Best Practices

​Observability & Governance Alignment

​Related Sections