Skip to content

Chapter 8: Understanding CNN Layers

“Every filter is a lens. Every layer is a language. A CNN doesn’t just see—it interprets.”


Why This Chapter Matters

A Convolutional Neural Network is more than a stack of layers—it’s a hierarchy of abstractions. With each convolution, pooling, and activation, your model goes from low-level pixels to high-level semantics:

  • Edge → Shape → Texture → Object

But to design effective CNNs (and debug them), you need to understand how each layer transforms the input.

This chapter walks you through:

  • What each major CNN layer does
  • How it changes shape, depth, and meaning
  • How to implement and visualize these layers in PyTorch and TensorFlow

You’ll finally understand why a 224×224×3 image turns into a 7×7×512 feature map.


Conceptual Breakdown

🔹 The Core CNN Layer Types

Layer Function
Conv2D Applies a filter/kernel over spatial regions
Activation (ReLU) Adds non-linearity so the network can learn complex patterns
BatchNorm Normalizes activations to stabilize training
Pooling Reduces spatial size while keeping key features
Dropout Prevents overfitting by randomly dropping activations
Fully Connected Maps final features to output classes

🔹 Convolution Layer: Conv2D

  • Uses a kernel (e.g., 3×3) that slides across the image
  • Performs element-wise multiplications and adds up the result
  • Outputs a feature map

📌 A convolution layer doesn’t see the entire image—it sees a window. As we stack layers, the receptive field grows.

Key parameters:

  • in_channels: number of input feature channels
  • out_channels: number of filters (i.e., output channels)
  • kernel_size: size of each filter (e.g., 3×3)
  • stride: how much the filter moves per step
  • padding: how edges are handled (valid vs same)

🔹 Pooling Layer: MaxPool2D, AvgPool2D

  • Downsamples feature maps (e.g., from 32×32 → 16×16)
  • Keeps strongest signals (MaxPooling) or averages regions (AvgPooling)
  • Reduces computation and helps detect patterns invariant to position

🔹 Batch Normalization

  • Normalizes output of a layer to have zero mean, unit variance
  • Stabilizes training, allows for higher learning rates
  • Applied after convolution, before activation

🔹 Activation Functions: ReLU and Beyond

Activation Formula Purpose
ReLU max(0, x) Introduces non-linearity
Leaky ReLU max(αx, x) Keeps small negative slope
Sigmoid 1 / (1 + e^-x) Squeezes to [0, 1]

📌 Most modern CNNs use ReLU for its simplicity and efficiency.


🔹 Fully Connected (Dense) Layers

After several convolution + pooling blocks, the feature map is flattened into a vector and passed through one or more Linear (PyTorch) or Dense (TF) layers.

  • Used to classify based on the features extracted earlier
  • Last layer’s size = number of classes

PyTorch Implementation

Let’s build a simple Conv → ReLU → Pool block:

import torch.nn as nn

cnn_block = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1),  # [B, 3, 224, 224] → [B, 16, 224, 224]
    nn.BatchNorm2d(16),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2)  # [B, 16, 224, 224] → [B, 16, 112, 112]
)

A full model:

class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 16, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(16, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 56 * 56, 10)  # assuming input was 224x224
        )

    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

TensorFlow Implementation

from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Conv2D(16, (3, 3), padding='same', input_shape=(224, 224, 3)),
    layers.BatchNormalization(),
    layers.ReLU(),
    layers.MaxPooling2D(pool_size=(2, 2)),

    layers.Conv2D(32, (3, 3), padding='same'),
    layers.ReLU(),
    layers.MaxPooling2D(pool_size=(2, 2)),

    layers.Flatten(),
    layers.Dense(10)
])

How Shapes Change

Operation PyTorch Shape Change TensorFlow Shape Change
Conv2D [B, C_in, H, W] → [B, C_out, H, W] [B, H, W, C_in] → [B, H, W, C_out]
MaxPool2D (2×2) [B, C, H, W] → [B, C, H/2, W/2] [B, H, W, C] → [B, H/2, W/2, C]
Flatten [B, C, H, W] → [B, C×H×W] [B, H, W, C] → [B, H×W×C]

Framework Comparison Table

Layer PyTorch TensorFlow
Convolution nn.Conv2d(in, out, k) layers.Conv2D(filters, k)
Pooling nn.MaxPool2d(k) layers.MaxPooling2D(k)
BatchNorm nn.BatchNorm2d(channels) layers.BatchNormalization()
Activation (ReLU) nn.ReLU() or F.relu() layers.ReLU() or inline
Fully Connected nn.Linear(in, out) layers.Dense(units)
Flatten nn.Flatten() layers.Flatten()

Mini-Exercise

Build a mini CNN with:

  • 2 Conv2D layers
  • ReLU and MaxPooling after each
  • Flatten + Dense to output 10 classes

  • Feed a dummy input of shape [1, 3, 224, 224] (PyTorch) or [1, 224, 224, 3] (TF)

  • Print the shape after each layer
  • Try replacing ReLU with LeakyReLU—observe differences

Bonus: Visualize the first convolutional layer filters (we’ll expand this in Chapter 17!)