Chapter 4: Building the ML Logic¶
4.1 Choose Your Model Strategy¶
There are two major approaches you can take depending on your goal and available compute:
Strategy | Description |
---|---|
A. Local Pretrained Model | Use a model like BERT, CartoonGAN, or Fast Style Transfer from transformers or PyTorch |
B. API-Driven Inference | Use external services (OpenAI, Replicate) to run inference and return results |
Let’s cover both methods, and you can pick which one fits each project.
4.2 Method A: Local Inference Using Pretrained Model¶
Perfect for small NLP tasks or lightweight image models.
Example: Local Sentiment Classifier with BERT
backend/app/main.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import BertTokenizer, BertForSequenceClassification
import torch
# Load model + tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
app = FastAPI()
class InputText(BaseModel):
text: str
@app.post("/predict")
def predict_sentiment(input: InputText):
inputs = tokenizer(input.text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=1)
label = torch.argmax(probs).item()
return {"label": label, "confidence": round(probs[0][label].item(), 4)}
Good for NLP-based tools like Sentiment Analyzer, News Classifier, etc.
4.3 Method B: API-Based Inference (e.g., OpenAI, Replicate)¶
This is best for:
-
Projects with limited local compute.
-
Tasks like GPT chat, DALL·E image generation, image-to-image style transfer.
Example: GPT-based Caption Generator (OpenAI) backend/app/main.py
import os
import openai
from dotenv import load_dotenv
from fastapi import FastAPI
from pydantic import BaseModel
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
app = FastAPI()
class PromptInput(BaseModel):
prompt: str
@app.post("/generate")
def generate_caption(input: PromptInput):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a witty meme caption generator."},
{"role": "user", "content": input.prompt}
]
)
caption = response['choices'][0]['message']['content']
return {"caption": caption}
Example: CartoonGAN via Replicate API backend/app/model/cartoonize.py
import replicate
import os
replicate.Client(api_token=os.getenv("REPLICATE_API_TOKEN"))
def cartoonize_image(image_path: str):
output = replicate.run(
"tstramer/cartoonify:latest",
input={"image": open(image_path, "rb")}
)
return output # typically a URL to the generated image
4.4 Handling Inference Responsibly¶
Concern | Solution |
---|---|
Timeouts | Add try/except, request timeouts (especially for APIs) |
Reproducibility | Set random seeds, save versions of models |
API Key Safety | Use .env, never hardcode keys |
Cost Management | Throttle usage (e.g., max 3 calls/min/user) |
Bad Inputs | Sanitize user input to avoid prompt injection |
4.5 Local Testing¶
Before you deploy, run local tests using:
uvicorn app.main:app --reload
Then test with:
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d "{\"prompt\":\"Make a funny caption for a dog eating pizza.\"}"
Or use Postman / Thunder Client (VSCode plugin) for easier testing.
4.6 Recap: What You’ve Accomplished¶
-
Built model inference logic (either locally or via API).
-
Secured your keys and API calls.
-
Tested that it responds to user input.
-
Ready to connect to your frontend.
Bonus (Optional): Add CORS if Frontend Can’t Reach Backend
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # In production, set this to your frontend URL
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)