Table of Contents
Vision in Code: Mastering Convolutional Neural Networks for Real-World Image Modeling¶
A Practical Guide to CNN Implementation with PyTorch and TensorFlow¶
Contents¶
π Preface¶
Part I β Foundations of Image Tensors and Preprocessing¶
Chapter 1: How a Neural Network Sees an Image
1.1 What is an image (JPEG, PNG, etc.) in memory?
1.2 From pixel data β NumPy array β Tensor
1.3 RGB channels, 8-bit scale, float conversion
1.4 [H, W, C] vs [C, H, W] β framework differences explained
1.5 Why model input shape matters
1.6 Visual walkthrough of image-to-input pipeline
Chapter 2: What is a Tensor (in Code and in Mind)?
2.1 Tensor shapes and memory layout
2.2 Dimensionality intuition
2.3 PyTorch: torch.tensor, .permute(), .view(), .reshape()
2.4 TensorFlow: tf.Tensor, .reshape(), .transpose(), broadcasting
2.5 Visual walkthroughs of shape manipulations
Chapter 3: From Pixels to Model Input
3.1 Full image input pipeline:
3.2 RGB loading β float32 conversion β normalization
3.3 Resizing and reshaping to expected input size
3.4 Batch dimension handling: unsqueeze() vs expand_dims()
3.5 Feeding tensors into Dense or Conv2D layers
3.6 Debugging mismatched shapes
3.7 Framework comparison of entire image β tensor β model flow
Part II β Preprocessing and Input Pipelines¶
Chapter 4: Standard Image Preprocessing
4.1 Resize, Normalize, Augment
4.2 Mean-std normalization vs 0β1 scaling
4.3 Format mismatches and their impact on accuracy
4.4 PIL vs OpenCV vs tf.image
4.5 Visualizing preprocessing effects
4.6 Matching preprocessing between training and inference
Chapter 5: Preprocessing for Pretrained Models
5.1 Matching pretrained model expectations: MobileNetV2, EfficientNet, ResNet, etc.
5.2 transforms.Normalize vs tf.keras.applications.*.preprocess_input()
5.3 PyTorch: torchvision.models
5.4 TensorFlow: keras.applications
5.5 Inference vs training preprocessing pitfalls
5.6 Side-by-side code snippets for each model
Chapter 6: Image Datasets: Getting Data Into the Network
6.1 Folder structure conventions
6.2 PyTorch: Dataset, DataLoader, ImageFolder, transforms
6.3 TensorFlow: tf.data.Dataset, image_dataset_from_directory()
6.4 Label mapping, batching, and shuffling
6.5 Visualizing batches from both frameworks
Chapter 7: Data Augmentation Techniques (Expanded)
7.1 Common augmentations: RandomCrop, ColorJitter, Cutout
7.2 Advanced augmentations: Mixup, CutMix (optional)
7.3 PyTorch: torchvision.transforms
7.4 TensorFlow: tf.image, Keras preprocessing layers
7.5 Before/after visualization of augmentation effects
Part III β CNN Architectures and Concepts¶
Chapter 8: Understanding CNN Layers
8.1 Kernels, filters, channels, strides, padding
8.2 Pooling (Max, Average), ReLU, BatchNorm
8.3 PyTorch: nn.Conv2d, nn.MaxPool2d, nn.BatchNorm2d, etc.
8.4 TensorFlow: Conv2D, MaxPooling2D, BatchNormalization, etc.
8.5 Conceptual breakdown + syntax comparison
Chapter 9: The CNN Vocabulary (Terms Demystified)
9.1 Key terms: kernel, convolution, stride, padding
9.2 Input/output channels, feature maps
9.3 Convolutional layer vs residual block
9.4 Layer variants: ReflectionPad2d, InstanceNorm2d, AdaptiveAvgPool2d
9.5 Visual and code-based examples
Chapter 10: Writing the forward() / call() Function
10.1 PyTorch: forward(), self.features, self.classifier
10.2 TensorFlow: call(), subclassing Model
10.3 Layer-by-layer flow visualized
10.4 Common mistakes in model building
Chapter 11: Model Summary and Parameter Inspection
11.1 PyTorch: model.parameters(), summary(), state_dict()
11.2 TensorFlow: .summary(), get_weights(), trainable_variables
11.3 How to freeze/unfreeze layers for fine-tuning
Chapter 12: Building Your First CNN: Patterns and Pitfalls
12.1 Simple architectures: LeNet-style, Mini-VGG
12.2 Choosing filter sizes, kernel shapes, stride
12.3 Stacking layers: when and why
12.4 Common design mistakes (too few filters, wrong input shape, etc.)
Part IV β Training and Fine-Tuning¶
Chapter 13: Loss Functions and Optimizers
13.1 PyTorch: loss_fn(), .backward(), optimizer.step()
13.2 TensorFlow: GradientTape, optimizer.apply_gradients()
13.3 Common losses: CrossEntropy
13.4 Optimizers: SGD, Adam
13.5 Visualizing gradient flow
Chapter 14: Training Loop Mechanics
14.1 PyTorch: full training loop with train_loader
14.2 TensorFlow: model.fit() vs custom training loop
14.3 Logging loss and metrics
14.4 Checkpoint saving, early stopping
14.5 Adding visuals for debugging and learning
Chapter 15: Training Strategies and Fine-Tuning Pretrained CNNs
15.1 When to Fine-Tune vs Freeze
15.2 Adapting Pretrained Models
15.3 Regularization Techniques
15.4 Training Strategies for Generalization
15.5 Recognizing Overfitting and Underfitting
Part V β Inference, Evaluation, and Visual Debugging¶
Chapter 16: Train vs Eval Mode
16.1 PyTorch: model.train(), model.eval(), no_grad()
16.2 TensorFlow: training=True/False
16.3 Dropout and BatchNorm behavior
16.4 Impact of mode on inference
Chapter 17: Visualizing Feature Maps and Filters
17.1 Getting intermediate layer outputs
17.2 PyTorch: forward hooks, manual slicing
17.3 ensorFlow: defining sub-models
17.4 Visualizing what the model is focusing on
Part VI β Deployment-Ready Insights¶
Chapter 18: Inference Pipeline Design
18.1 Keeping preprocessing consistent (train vs inference)
18.2 Reusable preprocess functions
18.3 Input validation, test-time augmentation
Chapter 19: Common Errors and How to Debug Them
19.1 Model always predicts one class? Check normalization
19.2 Input shape mismatch? Check dataloader
19.3 Nothingβs working? Try a single image pipeline
19.4 Debugging checklist for CNN-based models
Appendices¶
A. PyTorch vs TensorFlow Cheatsheet
B. Troubleshooting Image Model Failures
C. Glossary of Key Terms
D. Pretrained Model Reference Table (with links)
E. Sample Projects and Mini-Exercises per Chapter
Chapter Format¶
Each chapter ends with:
-
Conceptual Breakdown
-
PyTorch Implementation
-
TensorFlow Implementation
-
Framework Comparison Table
-
Use Case or Mini-Exercise