Chapter 31: Image Segmentation¶
“If object detection draws boxes, image segmentation draws boundaries.”
What Is Image Segmentation?¶
Image Segmentation is the process of classifying each pixel in an image into a category. Instead of just identifying or locating objects, segmentation understands the exact shape and boundary of each object.
There are two main types:
Type | Description |
---|---|
Semantic Segmentation | Classifies every pixel (e.g., "this pixel is part of a cat") |
Instance Segmentation | Distinguishes between multiple objects of the same class (e.g., "cat #1" vs "cat #2") |
Applications¶
-
Autonomous Vehicles: Detect roads, pedestrians, signs
-
Medical Imaging: Tumor segmentation in MRIs/CTs
-
Agriculture: Identify plant species in satellite images
-
Robotics: Scene understanding for manipulation
Dataset Example: Oxford Pets¶
The Oxford-IIIT Pet Dataset includes:
- Images of pets (cats/dogs)
- Segmentation masks where each pixel is labeled as pet, background, or border
TensorFlow Datasets provides a wrapper:
import tensorflow_datasets as tfds
dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)
Building a Segmentation Model (U-Net Style)¶
Step 1: Preprocess Input & Labels¶
def normalize(input_image, input_mask):
input_image = tf.cast(input_image, tf.float32) / 255.0
input_mask = tf.cast(input_mask, tf.uint8)
return input_image, input_mask
Step 2: U-Net Architecture (Encoder–Decoder)¶
from tensorflow.keras import layers, models
def unet_model(output_channels):
inputs = layers.Input(shape=[128, 128, 3])
# Encoder
down1 = layers.Conv2D(64, 3, activation='relu', padding='same')(inputs)
down1 = layers.MaxPooling2D()(down1)
down2 = layers.Conv2D(128, 3, activation='relu', padding='same')(down1)
down2 = layers.MaxPooling2D()(down2)
# Bottleneck
bottleneck = layers.Conv2D(256, 3, activation='relu', padding='same')(down2)
# Decoder
up1 = layers.UpSampling2D()(bottleneck)
up1 = layers.Concatenate()([up1, down2])
up1 = layers.Conv2D(128, 3, activation='relu', padding='same')(up1)
up2 = layers.UpSampling2D()(up1)
up2 = layers.Concatenate()([up2, down1])
up2 = layers.Conv2D(64, 3, activation='relu', padding='same')(up2)
outputs = layers.Conv2D(output_channels, 1, activation='softmax')(up2)
return models.Model(inputs=inputs, outputs=outputs)
Step 3: Train the Model¶
model = unet_model(output_channels=3) # background, object, border
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_batches, epochs=20, validation_data=val_batches)
Visualize the Segmentation Output¶
import matplotlib.pyplot as plt
def display(display_list):
plt.figure(figsize=(15, 5))
title = ['Input Image', 'True Mask', 'Predicted Mask']
for i in range(len(display_list)):
plt.subplot(1, len(display_list), i+1)
plt.title(title[i])
plt.imshow(tf.keras.utils.array_to_img(display_list[i]))
plt.axis('off')
plt.show()
Evaluation Tips¶
- Use IoU (Intersection over Union) for pixel-level accuracy
- Use Dice coefficient for imbalanced segmentation tasks
- Visually inspect masks to verify semantic alignment
Try It Live¶
🔗 Colab: U-Net Semantic Segmentation
(Coming soon to the repo: semantic segmentation playground with custom uploads!)
Summary¶
In this chapter, you:
- Understood the difference between classification, detection, and segmentation
- Built a U-Net-based semantic segmentation model in TensorFlow
- Visualized how the model learns to identify object boundaries, pixel by pixel