Observability Overview

The AiCore observability system provides comprehensive monitoring, analytics, and visualization for LLM operations across all supported providers. This documentation covers the key components and features of the observability system.

Key Features

Operation Tracking: Records all LLM operations (completions, embeddings) with full context
Performance Metrics: Tracks latency, token usage, success rates, and error patterns
Cost Analysis: Calculates and tracks API costs by provider/model/operation
Agent Monitoring: Correlates operations with agents, sessions, and workspaces
Multi-storage Support: JSON files for local development and SQL databases for production
High-performance Analytics: Polars integration for efficient data analysis

Core Components

1. Operation Collector

The LlmOperationCollector is the central component that captures and stores all LLM operation data. It provides:

Both synchronous and asynchronous recording methods
Automatic schema management for database storage
Flexible storage backends (JSON files or SQL databases)
Built-in cost calculation based on provider pricing

python

from aicore.observability import LlmOperationCollector

# Initialize with custom storage path
collector = LlmOperationCollector(storage_path="/custom/path/operations.json")

2. Dashboard

The interactive Observability Dashboard provides:

Real-time monitoring of LLM operations
Historical trend analysis
Cost breakdowns by team/project/model
Customizable views and filters

python

from aicore.observability import ObservabilityDashboard

# Launch dashboard with custom port
dashboard = ObservabilityDashboard(port=8080)
dashboard.run_server()

3. Data Analysis Tools

SQL Integration: Query operation data using standard SQL
Polars Integration: High-performance DataFrame operations
Custom Export: Export data to CSV, Parquet, or other formats

Data Model

The observability system tracks:

Operation Metadata:
- Provider, model, and endpoint used
- Timestamps and duration
- Status (success/failure)
Performance Metrics:
- Latency at various stages
- Token counts (input/output/total)
- Retry attempts
Contextual Information:
- Session and workspace identifiers
- Agent/action context
- Custom tags and metadata

Getting Started

Included a connection or async connection string in your .env file

For detailed information about each component, see the dedicated documentation pages:

Observability Overview ​

Key Features ​

Core Components ​

1. Operation Collector ​

2. Dashboard ​

3. Data Analysis Tools ​

Data Model ​

Getting Started ​