Chapter 10: How to Stay Within Free Tiers¶
With great deployment comes great responsibility, Chapter 10 is all about smart cost control and usage monitoring — the key to sustainably running your AI apps, especially when using paid APIs (like OpenAI/Replicate) or free-tier compute (Railway/Hugging Face/Vercel). Let’s protect your wallet while scaling your impact.
10.1 Why Cost Optimization Matters¶
Even small AI/ML apps can rack up unexpected costs due to:
- Token overuse (OpenAI, Claude)
- Repeated image inference (Replicate, Stability)
- Exceeding platform quotas (Railway, Vercel)
Goal: Ensure your app remains free or low-cost until you're ready to scale.
10.2 Free Tier Comparison Recap (Limits)¶
| Platform | Monthly Free Tier | Key Limitations |
|---|---|---|
| OpenAI | Free trial ($5–$18, one-time) | After trial, pay-per-token |
| Replicate | $10 in credits (one-time) | Pay-per-inference after |
| Hugging Face | Free Spaces + CPU only | No GPU, low RAM unless PRO |
| Railway | 500 hrs/month, 1GB deploy | Cold starts, no GPU |
| Vercel | 100GB bandwidth, unlimited deploys | Bandwidth limit for image-heavy apps |
You can deploy multiple apps under one free plan — just rotate projects if needed!
10.3 Control API Costs with Smart Code¶
- Set Prompt Length Limits (for OpenAI)
if len(prompt) > 250: return {"error": "Prompt too long"} - Add a Global Cooldown (e.g. 10 seconds)
import time
last_used = 0
def safe_generate(prompt):
global last_used
now = time.time()
if now - last_used < 10:
return {"error": "Please wait before generating again."}
last_used = now
# call your OpenAI API
- Use Small Models First
| Model | Est. Cost per 1K tokens |
|---|---|
| gpt-3.5-turbo | ~$0.0015 |
| gpt-4 | ~$0.03–$0.06 |
| openai/text-davinci-003 | ~$0.02 |
Use gpt-3.5-turbo by default for text.
10.4 Optimize Image Inference (Replicate)¶
| Strategy | Result |
|---|---|
| Cache output URLs | Save storage and bandwidth |
| Avoid large image inputs | Resize before sending |
| Use “Preview” mode in demos | Lower-res output = cheaper |
| Bundle image post-processing | Avoid second inference step |
You can also pre-generate results for demos to avoid live inference costs.
10.5 Monitoring Usage & Logs¶
| Platform | Tool / Page | What to Check |
|---|---|---|
| Hugging Face | Spaces → Logs | Inference errors, load time |
| Railway | Project Logs | Backend errors, API calls, cold starts |
| OpenAI | Usage Dashboard | Token usage and cost breakdown |
| Replicate | Billing Page + History | # of model runs, average cost |
| Vercel | Analytics (Pro only) | Bandwidth, requests |
Check logs after every major feature update.
10.6 Add Logging to Your Backend¶
utils/logger.py
import datetime
def log_usage(endpoint: str, input_data: dict):
with open("logs.txt", "a") as f:
f.write(f"{datetime.datetime.now()} | {endpoint} | {input_data}\n")
In your API route:
log_usage("/generate", {"prompt": request.prompt})
10.7 Automation Tools (Advanced)¶
| Tool | Use Case |
|---|---|
| cron + curl | Auto-ping API to avoid cold starts (Railway) |
| PostHog | Track frontend behavior/events |
| Sentry | Monitor frontend/backend errors |
| Google Analytics | Track public usage |
10.8 Final Cost-Safety Checklist¶
| Task | Done? |
|---|---|
| Prompt/input length limited | ✅ |
| Cooldown between API calls enforced | ✅ |
| .env keys secured | ✅ |
| Fallback error handling added | ✅ |
| Logs monitored and reviewed weekly | ✅ |
| Platform usage dashboards checked | ✅ |
Bonus: Add a Usage Warning in UI
Great for apps you plan to monetize later!if (usageCount >= 3) { alert("You’ve used 3/5 free generations. Upgrade for more!"); }
Chapter Summary¶
-
You now know how to control AI API costs using simple tricks
-
You can monitor app health and usage through logs and dashboards
-
Your app is now safe for public sharing or demoing