Technical Appendices¶
โThe last 10% is where the polish lives. And sometimes, itโs where the magic happens.โ
This section provides practical, implementation-ready resources to support every stage of chatbot developmentโfrom setup to deployment, from prompt design to cloud hosting. These appendices are designed for quick reference and rapid execution.
Appendix A: Setting Up Local and Cloud-Hosted Environments for Inference¶
Local Inference Setup (for Open-Source LLMs)¶
Hardware Recommended:
- GPU with โฅ6 GB VRAM (NVIDIA RTX 3060 or higher)
- 16 GB RAM, SSD storage
Software Stack:
conda create -n chatbot python=3.10
conda activate chatbot
pip install torch transformers accelerate
Run an Open LLM (e.g., Mistral-7B) with Hugging Face:
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
input = tokenizer("Explain vector embeddings.", return_tensors="pt").to("cuda")
output = model.generate(**input, max_new_tokens=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Cloud Deployment Setup (OpenAI or HF-based APIs)¶
Recommended Platforms:
- Backend: Render, Railway, GCP Cloud Run
- Frontend: Vercel, Netlify
- Storage: Supabase, Firebase, S3
Environment Variables to Configure:
OPENAI_API_KEY
SUPABASE_URL
SUPABASE_SERVICE_ROLE_KEY
ALLOWED_ORIGINS
Deployment Flow:
- Push to GitHub
- Connect repo to platform (Render/Vercel)
- Set env variables
- Auto-deploy on main branch push
Appendix B: Comprehensive Docker & Kubernetes Setup¶
Dockerfile (FastAPI-based Chatbot Backend)¶
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY ./app /app/app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Kubernetes Manifest (Basic)¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatbot-backend
spec:
replicas: 2
selector:
matchLabels:
app: chatbot
template:
metadata:
labels:
app: chatbot
spec:
containers:
- name: chatbot
image: your-dockerhub/chatbot:latest
ports:
- containerPort: 8000
Appendix C: Cloud Service Comparison (LLM Hosting & Vector DBs)¶
LLM APIs¶
Provider | Best For | Notes |
---|---|---|
OpenAI | Text + tool calling | GPT-3.5/4.0, stable + scalable |
Anthropic | Safer outputs, long context | Claude 3 models |
Mistral (via Hugging Face) | Open-source alternatives | Fast & license-flexible |
Vector Databases¶
Service | Features | Hosted? |
---|---|---|
Supabase | pgvector, open source, Postgres-based | Yes (free tier) |
Pinecone | Fully managed, enterprise-ready | Yes |
Weaviate | Schema-based search, hybrid support | Yes |
Qdrant | High performance, local + hosted | Both |
Appendix D: Prompt Engineering Cookbook¶
1. System Role Prompts¶
You are a helpful assistant that explains programming concepts to junior developers using simple analogies and code examples.
2. Few-shot Prompt (Support Bot)¶
User: I want a refund.
Bot: Sure, I can help you with that. Can you provide your order number?
User: My tracking ID isnโt updating.
Bot: No problem. Can you share the tracking ID so I can check its status?
3. Function Call Format (OpenAI)¶
{
"name": "get_invoice_status",
"parameters": {
"invoice_id": "string"
}
}
4. RAG Prompt (Document Q\&A)¶
You are a helpful assistant. Use only the provided context to answer.
If the answer is not in the context, say "I don't know."
Context:
{{retrieved_chunks}}
Question: {{user_question}}