Chapter 17: Monitoring and Analytics for Chatbots¶
“If you can’t measure it, you can’t improve it. And if you can’t see it, you can’t fix it.”
Behind every successful chatbot is a dashboard full of charts, logs, and metrics. Why? Because observability is the only way to scale reliably.
When users complain about “slowness,” you need to know whether the delay is in the LLM, the vector DB, or the frontend. When your token costs spike, you need to trace the endpoint and user responsible. And when your chatbot’s accuracy dips, analytics may show a pattern—wrong prompt, bad input, or drift in the document index.
In this chapter, you'll learn how to monitor the health, performance, and behavior of your chatbot using modern tools—so you can act fast, optimize smartly, and keep your users happy.
What Should You Monitor?¶
Metric Category | Examples | Tools |
---|---|---|
Latency | Response time per endpoint/model | Prometheus, Grafana, PostHog |
Throughput | Requests per minute / concurrent sessions | Cloud metrics, FastAPI middleware |
Failures | 4xx, 5xx errors, timeouts, exceptions | Sentry, OpenTelemetry, Rollbar |
Token Usage | Total tokens used per user/model | Custom logging + DB |
Vector Search | Similarity scores, chunk retrieval latency | Supabase logs, Redis traces |
Frontend Behavior | Clicks, message sends, bounce rate | PostHog, Mixpanel, Google Analytics |
1. Prometheus + Grafana (Infra + Backend Metrics)¶
Use When:¶
- You’re running FastAPI/Docker on your own servers
- You want deep insights into CPU, memory, response time, endpoint load
Setup Steps:¶
- Add
prometheus_fastapi_instrumentator
:
pip install prometheus-fastapi-instrumentator
- Add to your FastAPI app:
from prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app)
- Run Prometheus to scrape metrics:
scrape_configs:
- job_name: 'chatbot-backend'
static_configs:
- targets: ['localhost:8000']
- Visualize in Grafana (build custom dashboards)
Key Backend Metrics to Track¶
Metric | Description |
---|---|
http_request_duration |
API response time |
inference_latency |
Model generation speed (esp. large prompts) |
embedding_lookup_latency |
Time spent in vector DB retrieval |
token_usage_total |
Per-user and per-model token consumption |
queue_length_celery_tasks |
If using async task queues |
2. PostHog or Mixpanel (User Analytics)¶
Product analytics tools like PostHog or Mixpanel track:
- User flows and friction points
- Retention and active users
- Message frequency and drop-offs
- Feature usage (e.g., summarize, upload, dark mode)
Example: PostHog for React Chat UI¶
npm install posthog-js
import posthog from 'posthog-js';
posthog.init('phc_xxx', { api_host: 'https://app.posthog.com' });
posthog.capture('chat_started', { model: 'Mistral' });
posthog.capture('doc_uploaded', { fileSize: 13200 });
You can segment by auth ID, tenant, or model used for powerful filtering.
3. Error Logging with Sentry¶
- Instantly catch exceptions and API failures
- View stack traces, environment data, and user metadata
- Track frequency of specific error types
Setup in FastAPI¶
pip install sentry-sdk
import sentry_sdk
sentry_sdk.init(dsn="your_sentry_dsn")
Sentry will now log:
- Python exceptions
- HTTP errors
- Custom events (
sentry_sdk.capture_message()
)
4. Custom Token Usage & Cost Tracker¶
Monitoring tokens is critical for:
- Cost estimation (OpenAI, Anthropic, Mistral-hosted)
- Rate limiting and billing users fairly
Track token usage per request:¶
usage = response['usage']
tokens = usage['total_tokens']
store_token_log(user_id, tokens, endpoint, timestamp)
Save logs in a PostgreSQL table or use Supabase functions to aggregate usage.
5. Log Everything That Matters¶
Log Type | Use Case |
---|---|
Input prompts | Debug unexpected behavior |
Response time + source | Distinguish LLM vs. retrieval delays |
Retrieved docs | Debug RAG mismatches or hallucinations |
User metadata | Multi-tenant auditing |
Use logging
module in Python or ship logs to:
- Logstash + Kibana (ELK stack)
- Google Cloud Logging
- AWS CloudWatch
Real-World Dashboard Example¶
A production chatbot dashboard might include:
- Avg response time (by endpoint, model)
- Live users online
- Token burn rate (hourly/daily)
- Top queries / intents
- Error rate trend (last 24h)
- Model response latency histogram
- Most active tenants/users
These dashboards are not just for engineers—they’re valuable for product teams, business leaders, and support agents.
Summary¶
Tool | Focus | Ideal Use Case |
---|---|---|
Prometheus | Infra metrics, response latency | Self-hosted backend observability |
Grafana | Dashboards for metrics | Visual monitoring for DevOps teams |
PostHog | User behavior analytics | Understanding feature adoption & UX |
Sentry | Error and exception tracking | Debugging and log tracing |
Custom logs | Token usage, prompt traceability | Cost optimization and audit trails |
You can’t improve what you don’t monitor. And you can’t scale what you don’t understand.