Skip to content

Part 5: Case Studies and Templates

20. Case Study: Document Intelligence Chatbot

“You don't just answer questions. You reason across pages, paragraphs, and payloads.”


Overview

The Document Intelligence Chatbot is a full-stack RAG-powered AI assistant that allows users to:

  • Upload documents (PDFs, images, DOCX)
  • Run OCR (if needed) on non-text files
  • Split and embed text into a vector database
  • Query via a chatbot interface using GPT
  • Retrieve, synthesize, and stream document-aware answers

This project sits at the intersection of:

  • NLP, embeddings, and chunking
  • File handling and background processing
  • Realtime chat UI and long-context memory
  • Vector DBs and API orchestration

Initial Scalable Structure (from day one)

Unlike the Invoice Analyzer, this project started with scale in mind.

Backend (FastAPI)

backend/
├── app/
│   ├── api/
│      ├── chat.py
│      └── upload.py
│   ├── services/
│      ├── ocr_engine.py
│      ├── gpt_client.py
│      ├── chunker.py
│      ├── embedding.py
│      └── retriever.py
│   ├── schemas/
│      ├── chat.py
│      └── document.py
│   ├── vectorstore/
│      └── supabase.py
│   └── core/
│       ├── config.py
│       └── logging.py

Frontend (React + Vite)

frontend/
└── src/
    ├── features/
       ├── chat/
       └── upload/
    ├── components/
       ├── ChatBox.tsx
       ├── FileDropZone.tsx
       └── StreamingBubble.tsx
    ├── services/
       └── apiClient.ts
    ├── hooks/
       └── useStreamingChat.ts
    └── shared/
        └── utils/
            └── chunkPreview.ts

Core AI Flow

[Upload Document]
    ↓
OCR (if needed)
    ↓
Text Chunking
    ↓
Embeddings → Supabase (pgvector)
    ↓
[Ask Question]
    ↓
Query Vector DB
    ↓
Pass relevant chunks to GPT
    ↓
Stream answer token-by-token
    ↓
[Render Chat UI]

Testing Breakdown

Unit Tests

tests/
├── services/
│   ├── test_ocr_engine.py
│   ├── test_chunker.py
│   └── test_embedding.py

Integration Tests

tests/
├── api/
│   ├── test_chat.py
│   └── test_upload.py

Mocks

shared/test_helpers/
├── mock_pdf.pdf
├── fake_embeddings.json
└── mock_gpt_response.py

Architectural Wins

Problem Solution
Mixing OCR and GPT in one pipeline Broke into ocr_engine.py and gpt_client.py
Need for streaming chat responses Used FastAPI’s StreamingResponse with async token yield
Scattered vector DB logic Moved to vectorstore/supabase.py as a wrapper
Complex GPT prompts per feature Template-driven prompts in prompts/ or inline Jinja
Upload race conditions Background task queue (e.g., Celery or FastAPI background task)
Slow test speed due to real GPT Replaced with mock_gpt_response.py + snapshot tests

Deployment Stack

Layer Tool
Backend FastAPI (hosted on Render)
Frontend React + Vite (deployed to Netlify)
Chat LLM OpenAI GPT-4 via streaming
Vector DB Supabase with pgvector
Storage Supabase Buckets
OCR PaddleOCR (fallback to Tesseract)
Embedding Model text-embedding-3-small
CI/CD GitHub Actions + PR checks

Dev Experience Enhancements

  • Local .env loaded via dotenv with fallback to Render secrets
  • Tailwind UI for fast design of chat & file upload zones
  • Monorepo setup using pnpm workspaces with:
  apps/
    ├── frontend/
    └── backend/
  packages/
    └── shared-schemas/

Edge Case Handlings

Edge Case Resolution
PDF upload with no text layer OCR fallback with PaddleOCR
Image uploads with handwriting Post-OCR cleanup with regex correction
GPT context overflow Auto-summarization of long chunks
Broken token stream Frontend AbortController with retry
Multi-document queries Multi-file embedding tracking with document_id tagging

Final Structure Snapshot

Backend:

app/
├── api/
│   └── chat.py
├── services/
│   ├── ocr_engine.py
│   ├── embedding.py
│   └── retriever.py
├── vectorstore/
│   └── supabase.py
├── schemas/
│   └── chat.py
├── prompts/
│   └── rag_template.txt

Frontend:

src/
├── features/
│   ├── upload/
│   └── chat/
├── components/
│   ├── ChatBox.tsx
│   └── StreamingBubble.tsx
├── hooks/
│   └── useStreamingChat.ts

Outcomes

  • Real-time chat performance with \~2s latency
  • GPT answers grounded in actual user documents
  • 92%+ retrieval accuracy via text-embedding-3-small
  • Horizontal scalability with Render’s autoscaling + pgvector indexing
  • Successfully demoed for 3 different use cases (contracts, academic PDFs, invoices)

Key Takeaways

✅ Build scalable from the start if the product involves multiple AI pipelines
✅ Use dedicated service layers to isolate OCR, GPT, chunking, and vector logic
✅ Design frontend UX-first to support streaming and async behaviors
✅ Leverage Supabase + FastAPI for full open-source vertical integration
✅ Embrace modular tests, mocks, and monitoring as part of the structure