Chapter 4: Standard Image Preprocessing¶
“A well-preprocessed image is half the training. The other half is just optimization.”
Why This Chapter Matters¶
Imagine handing a blurry, off-centered photo to a human and asking, “What’s in this?” You’d expect a confused answer—and that’s exactly how neural networks feel when they receive poorly scaled, misaligned, or inconsistent input images.
Preprocessing is the discipline of preparing your data in a way the model can truly understand. It’s not just resizing or flipping—it’s about standardizing, cleaning, and augmenting images so your CNN can extract meaningful patterns.
Done right, preprocessing:
-
Boosts model accuracy
-
Reduces overfitting
-
Makes inference stable
-
Speeds up convergence
Done wrong? You’ll spend weeks tuning your architecture for nothing.
Conceptual Breakdown¶
🔹 What Is Image Preprocessing?
Image preprocessing refers to the transformations applied to raw image data before it’s fed into the model. It’s the very first thing your pipeline does, and its job is to:
-
Resize or crop the image to fit the model’s input size
-
Convert the pixel values to float and normalize them
-
Augment images with randomness during training to improve generalization
🔹 Common Preprocessing Operations
Step | Purpose |
---|---|
Resize | Match input shape expected by CNN (e.g., 224×224) |
Crop | Focus on central content or apply randomness |
Normalize | Scale values for model stability and consistency |
Augment | Random changes (flip, rotate, jitter) to generalize better |
🔹 Mean-Std Normalization vs 0–1 Scaling
You’ll often see two types of normalization:
-
0–1 scaling: divide pixel values by 255
- Simpler, used for custom models
-
Mean-std normalization: subtract dataset mean, divide by std
- Used with pretrained models (e.g., ResNet, MobileNet)
Format | Example |
---|---|
0–1 Scaling | img = img / 255.0 |
Mean-std Norm | img = (img - mean) / std |
ImageNet Mean | [0.485, 0.456, 0.406] |
ImageNet Std | [0.229, 0.224, 0.225] |
🔹 Effects of Preprocessing on Training
Preprocessing Problem | Symptoms |
---|---|
No normalization | Model fails to converge |
Wrong mean/std | Bad predictions, poor transfer |
Mismatched resize | Shape errors at input layers |
Augmenting test data | Erratic evaluation accuracy |
Note: Always apply augmentation to training data only, never to validation/test.
🔹 PIL vs OpenCV vs tf.image
Library | Style | Format Used |
---|---|---|
PIL | Pythonic | RGB (default) |
OpenCV | Fast, C++ | BGR (must convert!) |
tf.image | Tensor-native | TensorFlow Tensors |
If you’re mixing OpenCV with TensorFlow or PyTorch, be careful—color channels will be flipped unless you convert BGR → RGB.
🔹 Preprocessing Matching: Train vs Inference Your model learns with a certain set of expectations (shape, scale, mean/std). If your inference pipeline doesn’t match your training pipeline, your model will:
Output low-confidence predictions
Misclassify even familiar data
📌 Golden Rule: Always reuse your training preprocessing (minus augmentation) for inference.
PyTorch Implementation¶
🔸 Training Preprocessing
from torchvision import transforms
train_transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(), # scales to [0, 1]
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
🔸 Validation / Inference Preprocessing
val_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
TensorFlow Implementation¶
🔸 Training Preprocessing
import tensorflow as tf
def preprocess_train(img):
img = tf.image.resize(img, [256, 256])
img = tf.image.random_crop(img, [224, 224, 3])
img = tf.image.random_flip_left_right(img)
img = tf.cast(img, tf.float32) / 255.0
mean = tf.constant([0.485, 0.456, 0.406])
std = tf.constant([0.229, 0.224, 0.225])
img = (img - mean) / std
return img
🔸 Inference Preprocessing
def preprocess_eval(img):
img = tf.image.resize(img, [224, 224])
img = tf.cast(img, tf.float32) / 255.0
mean = tf.constant([0.485, 0.456, 0.406])
std = tf.constant([0.229, 0.224, 0.225])
img = (img - mean) / std
return img
Framework Comparison Table¶
Task | PyTorch | TensorFlow |
---|---|---|
Resize | transforms.Resize() | tf.image.resize() |
Crop | RandomCrop, CenterCrop | tf.image.random_crop() |
Flip | RandomHorizontalFlip() | tf.image.random_flip_left_right() |
Normalize | transforms.Normalize(mean, std) | Manual: (img - mean) / std |
Convert to tensor | transforms.ToTensor() | tf.cast(img, tf.float32) / 255.0 |
Augment only in training | Manual via Compose | Wrap in @tf.function or dataset map |
Mini-Exercise¶
Goal: Create a preprocessing function for training and one for inference.
-
Pick an image of any size (e.g., 500×300).
-
Apply:
-
Resize to 256×256
-
RandomCrop to 224×224 (training only)
-
Horizontal flip (training only)
-
Normalize using ImageNet mean/std
-
-
Compare preprocessed training and inference outputs.
-
Visualize the results using matplotlib.
Bonus: Try using OpenCV to load the image and manually convert from BGR to RGB.