Skip to content

Chapter 5: Preprocessing for Pretrained Models

You can’t transfer knowledge if the inputs don’t speak the same language. Preprocessing isn’t optional—it’s the protocol.


Why This Chapter Matters

Pretrained CNNs—like those from torchvision.models or keras.applications—have been trained on massive datasets like ImageNet. But they were trained with specific input formats in mind:

  • A fixed input shape (often 224×224)

  • Normalized using specific mean and std

  • RGB ordering (not grayscale or BGR)

  • Scaled values in a specific range (e.g., [0, 1], [-1, 1])

If you deviate from this, even slightly, your fine-tuned model might:

  • Output garbage predictions

  • Fail to converge during training

  • Seem to overfit instantly

This chapter teaches you how to match preprocessing exactly to each pretrained model so you can extract their full power—safely.


Conceptual Breakdown

🔹 Pretrained Model Expectations

Model Expected Input Shape Pixel Range Normalization
ResNet, VGG 224×224×3 [0.0 – 1.0] Mean: [0.485, 0.456, 0.406]
Std: [0.229, 0.224, 0.225]
MobileNetV2 224×224×3 [-1.0 – 1.0] preprocess_input() scales it
EfficientNet 224×224×3 or 240×240 [0.0 – 255.0] preprocess_input() handles it

Rule of thumb: Always check the docs or source code of the model you’re using.


🔹 Why You Can’t Just Use .ToTensor() or /255.0

Because pretrained models were trained on data that was already:

  • Normalized using dataset-wide statistics

  • Possibly scaled to [-1, 1] or whitened

  • Fed in a specific channel order

If you skip or mismatch the normalization, you're effectively corrupting the input distribution—and the model’s learned filters won’t match.


🔹 Matching PyTorch Pretrained Models

Most PyTorch models in torchvision.models use:

  • [0, 1] range from ToTensor()

  • Mean-std normalization with ImageNet stats

  transforms.Normalize(mean=[0.485, 0.456, 0.406],
                     std=[0.229, 0.224, 0.225])

📌 Always use this transform after ToTensor().


🔹 Matching TensorFlow/Keras Pretrained Models

Each model has a helper function:

Model Family Preprocessing Function
ResNet, VGG keras.applications.resnet50.preprocess_input()
MobileNetV2 keras.applications.mobilenet_v2.preprocess_input()
EfficientNetB0 keras.applications.efficientnet.preprocess_input()

These handle:

  • Mean/std normalization

  • Scaling to [-1, 1] if needed

  • RGB channel order

📌 These functions expect raw pixel values (0–255 float), not normalized.


PyTorch Implementation

from torchvision import models, transforms
from PIL import Image

# Load pretrained ResNet
model = models.resnet50(pretrained=True)
model.eval()

# Preprocessing pipeline
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),  # scales to [0,1]
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# Load image and preprocess
img = Image.open("dog.jpg").convert("RGB")
tensor = preprocess(img).unsqueeze(0)  # shape: [1, 3, 224, 224]

TensorFlow Implementation

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing import image
import numpy as np

# Load pretrained MobileNetV2
model = MobileNetV2(weights="imagenet")

# Load and preprocess
img = image.load_img("dog.jpg", target_size=(224, 224))
img_array = image.img_to_array(img)  # shape: [224, 224, 3]
img_array = np.expand_dims(img_array, axis=0)  # [1, 224, 224, 3]
img_array = preprocess_input(img_array)  # scales to [-1, 1]

Framework Comparison Table

Framework Step PyTorch TensorFlow
Preprocessing Built-in model normalization transforms.Normalize() keras.applications.*.preprocess_input()
Channel Order Input Format [C, H, W] [H, W, C]
Expected Range Pixel Values [0, 1] + mean/std [0, 255] → auto-scaled internally
Input Shape Default for pretrained [1, 3, 224, 224] [1, 224, 224, 3]

Mini-Exercise

Try loading the same image in both PyTorch and TensorFlow. Use:

  • resnet50 in PyTorch

  • ResNet50 in Keras

  • Apply the correct preprocessing for each framework.

  • Feed the tensor into the model and extract the top-1 class prediction.

  • Confirm both models give similar results.

Bonus: Print the difference between preprocessed tensors in PyTorch vs TensorFlow (after matching shape order).