Chapter 27: Data Augmentation¶
“The best way to generalize is to never see the same image twice.”
Why Augment Data?¶
When training a computer vision model, overfitting is a common enemy—especially when you have limited data. That's where data augmentation steps in. It’s a strategy to artificially expand your dataset by modifying existing images in subtle ways.
Think of it like teaching a child to recognize a cat, not just in one pose or lighting, but from every angle, distance, and background.
Types of Augmentation¶
Here are common transformations applied in real-time:
Augmentation Type | Description |
---|---|
Flip | Horizontal or vertical reflection |
Rotation | Rotating image by a few degrees |
Zoom | Zooming in/out randomly |
Shift | Moving the image along height/width |
Shear | Slanting the image slightly |
Brightness | Randomly adjusting the image lighting |
Noise | Adding small pixel-level disturbances |
TensorFlow Implementation – tf.keras.preprocessing
vs tf.image
¶
Option 1: With ImageDataGenerator (older API)¶
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest'
)
datagen.fit(train_images)
model.fit(datagen.flow(train_images, train_labels, batch_size=32), epochs=10)
Option 2: With tf.keras.layers
(recommended)¶
from tensorflow.keras import layers, models
data_augmentation = tf.keras.Sequential([
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
layers.RandomContrast(0.1)
])
model = tf.keras.Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(32, 3, activation='relu'),
layers.MaxPooling2D(),
...
])
Benefits of Data Augmentation¶
- Improves model generalization
- Reduces overfitting
- Makes training robust to real-world distortions
- Requires no extra labeled data
Best Practices¶
- Don’t overdo it—too much distortion can confuse the model
- Use augmentation only on the training set, never on validation/test
- Combine augmentation with regularization (like dropout) for even better results
Try It Live: Google Colab¶
▶️ Colab: Visualize and Apply Data Augmentation (Coming soon to our project repo!)
Summary¶
In this chapter, you learned how data augmentation simulates a richer dataset from limited examples. By randomly flipping, rotating, zooming, and more, you can help your CNN see images as if in the wild—not just from a studio.