Chapter 17: Monitoring and Analytics for Chatbots¶

“If you can’t measure it, you can’t improve it. And if you can’t see it, you can’t fix it.”

Behind every successful chatbot is a dashboard full of charts, logs, and metrics. Why? Because observability is the only way to scale reliably.

When users complain about “slowness,” you need to know whether the delay is in the LLM, the vector DB, or the frontend. When your token costs spike, you need to trace the endpoint and user responsible. And when your chatbot’s accuracy dips, analytics may show a pattern—wrong prompt, bad input, or drift in the document index.

In this chapter, you'll learn how to monitor the health, performance, and behavior of your chatbot using modern tools—so you can act fast, optimize smartly, and keep your users happy.

What Should You Monitor?¶

Metric Category	Examples	Tools
Latency	Response time per endpoint/model	Prometheus, Grafana, PostHog
Throughput	Requests per minute / concurrent sessions	Cloud metrics, FastAPI middleware
Failures	4xx, 5xx errors, timeouts, exceptions	Sentry, OpenTelemetry, Rollbar
Token Usage	Total tokens used per user/model	Custom logging + DB
Vector Search	Similarity scores, chunk retrieval latency	Supabase logs, Redis traces
Frontend Behavior	Clicks, message sends, bounce rate	PostHog, Mixpanel, Google Analytics

1. Prometheus + Grafana (Infra + Backend Metrics)¶

Use When:¶

You’re running FastAPI/Docker on your own servers
You want deep insights into CPU, memory, response time, endpoint load

Setup Steps:¶

Add prometheus_fastapi_instrumentator:

pip install prometheus-fastapi-instrumentator

Add to your FastAPI app:

from prometheus_fastapi_instrumentator import Instrumentator

Instrumentator().instrument(app).expose(app)

Run Prometheus to scrape metrics:

scrape_configs:
  - job_name: 'chatbot-backend'
    static_configs:
      - targets: ['localhost:8000']

Visualize in Grafana (build custom dashboards)

Key Backend Metrics to Track¶

Metric	Description
`http_request_duration`	API response time
`inference_latency`	Model generation speed (esp. large prompts)
`embedding_lookup_latency`	Time spent in vector DB retrieval
`token_usage_total`	Per-user and per-model token consumption
`queue_length_celery_tasks`	If using async task queues

2. PostHog or Mixpanel (User Analytics)¶

Product analytics tools like PostHog or Mixpanel track:

User flows and friction points
Retention and active users
Message frequency and drop-offs
Feature usage (e.g., summarize, upload, dark mode)

Example: PostHog for React Chat UI¶

npm install posthog-js

import posthog from 'posthog-js';
posthog.init('phc_xxx', { api_host: 'https://app.posthog.com' });

posthog.capture('chat_started', { model: 'Mistral' });
posthog.capture('doc_uploaded', { fileSize: 13200 });

You can segment by auth ID, tenant, or model used for powerful filtering.

3. Error Logging with Sentry¶

Instantly catch exceptions and API failures
View stack traces, environment data, and user metadata
Track frequency of specific error types

Setup in FastAPI¶

pip install sentry-sdk

import sentry_sdk
sentry_sdk.init(dsn="your_sentry_dsn")

Sentry will now log:

Python exceptions
HTTP errors
Custom events (sentry_sdk.capture_message())

4. Custom Token Usage & Cost Tracker¶

Monitoring tokens is critical for:

Cost estimation (OpenAI, Anthropic, Mistral-hosted)
Rate limiting and billing users fairly

Track token usage per request:¶

usage = response['usage']
tokens = usage['total_tokens']
store_token_log(user_id, tokens, endpoint, timestamp)

Save logs in a PostgreSQL table or use Supabase functions to aggregate usage.

5. Log Everything That Matters¶

Log Type	Use Case
Input prompts	Debug unexpected behavior
Response time + source	Distinguish LLM vs. retrieval delays
Retrieved docs	Debug RAG mismatches or hallucinations
User metadata	Multi-tenant auditing

Use logging module in Python or ship logs to:

Logstash + Kibana (ELK stack)
Google Cloud Logging
AWS CloudWatch

Real-World Dashboard Example¶

A production chatbot dashboard might include:

Avg response time (by endpoint, model)
Live users online
Token burn rate (hourly/daily)
Top queries / intents
Error rate trend (last 24h)
Model response latency histogram
Most active tenants/users

These dashboards are not just for engineers—they’re valuable for product teams, business leaders, and support agents.

Summary¶

Tool	Focus	Ideal Use Case
Prometheus	Infra metrics, response latency	Self-hosted backend observability
Grafana	Dashboards for metrics	Visual monitoring for DevOps teams
PostHog	User behavior analytics	Understanding feature adoption & UX
Sentry	Error and exception tracking	Debugging and log tracing
Custom logs	Token usage, prompt traceability	Cost optimization and audit trails

You can’t improve what you don’t monitor. And you can’t scale what you don’t understand.