Chapter 7: Data Augmentation Techniques (Expanded)¶
“If your model memorizes your dataset, you’ve failed. Augmentation teaches it to imagine.”
Why This Chapter Matters¶
Most real-world datasets are:
-
Small
-
Biased
-
Repetitive
Without augmentation, your CNN learns to memorize patterns instead of generalizing. That’s why data augmentation is not just a “nice to have”—it’s a core strategy to help models perform better on unseen data.
In this chapter, we go beyond the basics:
-
You’ll learn classic augmentations like random crop, flip, and jitter
-
Then expand into modern techniques like Cutout, Mixup, and CutMix
-
And you’ll implement these in both PyTorch and TensorFlow, with visualization
Conceptual Breakdown¶
🔹 What is Data Augmentation?
Augmentation is the process of applying random transformations to training images on the fly—so the model sees a new version of each image every epoch.
It’s only used during training, never during validation or inference.
🔹 Classic Augmentations
Augmentation | Effect |
---|---|
RandomCrop | Focus on subregions, simulate framing variation |
HorizontalFlip | Simulate left-right symmetry |
ColorJitter | Adjust brightness, contrast, saturation, hue |
Rotation | Handle orientation bias |
Gaussian Blur | Simulate camera focus variation |
🔹 Advanced Augmentations
Technique | Description |
---|---|
Cutout | Randomly removes a square region (forces model to focus elsewhere) |
Mixup | Blends two images and their labels linearly |
CutMix | Combines patches from different images (and labels) |
🔹 Why They Work
-
Cutout teaches robustness to occlusion
-
Mixup teaches interpolation between classes
-
CutMix teaches spatial composition and label smoothing
📌 These augmentations improve generalization, reduce overfitting, and even improve model calibration.
PyTorch Implementation¶
🔸 Classic Augmentations
from torchvision import transforms
train_transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
🔸 Cutout (Custom)
import torch
import numpy as np
import torchvision.transforms.functional as F
class Cutout(object):
def __init__(self, size=50):
self.size = size
def __call__(self, img):
c, h, w = img.shape
y = np.random.randint(h)
x = np.random.randint(w)
y1 = np.clip(y - self.size // 2, 0, h)
y2 = np.clip(y + self.size // 2, 0, h)
x1 = np.clip(x - self.size // 2, 0, w)
x2 = np.clip(x + self.size // 2, 0, w)
img[:, y1:y2, x1:x2] = 0.0
return img
Add it to your transform:
train_transform.transforms.append(Cutout(size=32))
TensorFlow Implementation¶
🔸 Classic Augmentations
import tensorflow as tf
from tensorflow.keras import layers
data_augment = tf.keras.Sequential([
layers.Resizing(256, 256),
layers.RandomCrop(224, 224),
layers.RandomFlip("horizontal"),
layers.RandomBrightness(factor=0.2),
layers.RandomContrast(factor=0.2),
])
train_ds = train_ds.map(lambda x, y: (data_augment(x), y))
🔸 CutMix
import tensorflow_addons as tfa
def cutmix(images, labels, alpha=1.0):
batch_size = tf.shape(images)[0]
indices = tf.random.shuffle(tf.range(batch_size))
shuffled_images = tf.gather(images, indices)
shuffled_labels = tf.gather(labels, indices)
lam = tf.random.uniform([], 0, 1)
image_shape = tf.shape(images)[1:]
cut_w = tf.cast(image_shape[1] * tf.math.sqrt(1 - lam), tf.int32)
cut_h = tf.cast(image_shape[0] * tf.math.sqrt(1 - lam), tf.int32)
cx = tf.random.uniform([], 0, image_shape[1], dtype=tf.int32)
cy = tf.random.uniform([], 0, image_shape[0], dtype=tf.int32)
x1 = tf.clip_by_value(cx - cut_w // 2, 0, image_shape[1])
y1 = tf.clip_by_value(cy - cut_h // 2, 0, image_shape[0])
x2 = tf.clip_by_value(cx + cut_w // 2, 0, image_shape[1])
y2 = tf.clip_by_value(cy + cut_h // 2, 0, image_shape[0])
padding = [[0, 0], [y1, image_shape[0] - y2], [x1, image_shape[1] - x2], [0, 0]]
cutmix_img = tf.pad(shuffled_images, padding, constant_values=0)
new_images = tf.tensor_scatter_nd_update(images, [[0]], [cutmix_img])
new_labels = lam * labels + (1 - lam) * shuffled_labels
return new_images, new_labels
📌 You can also use TensorFlow Addons or Albumentations for more advanced pipelines.
Framework Comparison Table¶
Augmentation | PyTorch (torchvision) | TensorFlow (Keras or tf.data) |
---|---|---|
Resize/Crop/Flip | transforms.* |
layers.Resizing() , layers.Random*() |
Color Jitter | transforms.ColorJitter() |
layers.RandomBrightness() , etc. |
Cutout | Custom class | Custom or tfa.image.random_cutout() |
Mixup | Custom function | Custom function or tf.image logic |
CutMix | Custom function | TensorFlow Addons or custom logic |
Batch-safe usage | transforms.Compose() + DataLoader |
.map(lambda x, y: ...) in tf.data |
Mini-Exercise¶
-
Pick a sample dataset (e.g., 100 dog and cat images)
-
Apply:
🔸 RandomCrop + Flip + ColorJitter (PyTorch)
🔸 Resize + RandomBrightness (TF)
- Implement:
🔸 Cutout in PyTorch
🔸 Mixup or CutMix in TensorFlow
-
Visualize 5 examples before and after augmentation
-
Train a simple CNN with and without augmentation—observe accuracy