Chapter 5: Preprocessing for Pretrained Models¶

“You can’t transfer knowledge if the inputs don’t speak the same language. Preprocessing isn’t optional—it’s the protocol.”

Why This Chapter Matters¶

Pretrained CNNs—like those from torchvision.models or keras.applications—have been trained on massive datasets like ImageNet. But they were trained with specific input formats in mind:

A fixed input shape (often 224×224)
Normalized using specific mean and std
RGB ordering (not grayscale or BGR)
Scaled values in a specific range (e.g., [0, 1], [-1, 1])

If you deviate from this, even slightly, your fine-tuned model might:

Output garbage predictions
Fail to converge during training
Seem to overfit instantly

This chapter teaches you how to match preprocessing exactly to each pretrained model so you can extract their full power—safely.

Conceptual Breakdown¶

🔹 Pretrained Model Expectations

Model	Expected Input Shape	Pixel Range	Normalization
ResNet, VGG	224×224×3	[0.0 – 1.0]	Mean: `[0.485, 0.456, 0.406]` Std: `[0.229, 0.224, 0.225]`
MobileNetV2	224×224×3	[-1.0 – 1.0]	`preprocess_input()` scales it
EfficientNet	224×224×3 or 240×240	[0.0 – 255.0]	`preprocess_input()` handles it

Rule of thumb: Always check the docs or source code of the model you’re using.

🔹 Why You Can’t Just Use .ToTensor() or /255.0

Because pretrained models were trained on data that was already:

Normalized using dataset-wide statistics
Possibly scaled to [-1, 1] or whitened
Fed in a specific channel order

If you skip or mismatch the normalization, you're effectively corrupting the input distribution—and the model’s learned filters won’t match.

🔹 Matching PyTorch Pretrained Models

Most PyTorch models in torchvision.models use:

[0, 1] range from ToTensor()
Mean-std normalization with ImageNet stats

  transforms.Normalize(mean=[0.485, 0.456, 0.406],
                     std=[0.229, 0.224, 0.225])

📌 Always use this transform after ToTensor().

🔹 Matching TensorFlow/Keras Pretrained Models

Each model has a helper function:

Model Family	Preprocessing Function
ResNet, VGG	`keras.applications.resnet50.preprocess_input()`
MobileNetV2	`keras.applications.mobilenet_v2.preprocess_input()`
EfficientNetB0	`keras.applications.efficientnet.preprocess_input()`

These handle:

Mean/std normalization
Scaling to [-1, 1] if needed
RGB channel order

📌 These functions expect raw pixel values (0–255 float), not normalized.

PyTorch Implementation¶

from torchvision import models, transforms
from PIL import Image

# Load pretrained ResNet
model = models.resnet50(pretrained=True)
model.eval()

# Preprocessing pipeline
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),  # scales to [0,1]
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# Load image and preprocess
img = Image.open("dog.jpg").convert("RGB")
tensor = preprocess(img).unsqueeze(0)  # shape: [1, 3, 224, 224]

TensorFlow Implementation¶

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing import image
import numpy as np

# Load pretrained MobileNetV2
model = MobileNetV2(weights="imagenet")

# Load and preprocess
img = image.load_img("dog.jpg", target_size=(224, 224))
img_array = image.img_to_array(img)  # shape: [224, 224, 3]
img_array = np.expand_dims(img_array, axis=0)  # [1, 224, 224, 3]
img_array = preprocess_input(img_array)  # scales to [-1, 1]

Framework Comparison Table¶

Framework	Step	PyTorch	TensorFlow
Preprocessing	Built-in model normalization	transforms.Normalize()	keras.applications.*.preprocess_input()
Channel Order	Input Format	[C, H, W]	[H, W, C]
Expected Range	Pixel Values	[0, 1] + mean/std	[0, 255] → auto-scaled internally
Input Shape	Default for pretrained	[1, 3, 224, 224]	[1, 224, 224, 3]

Mini-Exercise¶

Try loading the same image in both PyTorch and TensorFlow. Use:

resnet50 in PyTorch
ResNet50 in Keras
Apply the correct preprocessing for each framework.
Feed the tensor into the model and extract the top-1 class prediction.
Confirm both models give similar results.

Bonus: Print the difference between preprocessed tensors in PyTorch vs TensorFlow (after matching shape order).