Chapter 17: Backpropagation & Gradient Descent¶
“A neural network learns by looking backward—adjusting the past to improve the future.”
This chapter reveals the mechanism that powers all deep learning: backpropagation.
We will:
- Understand how gradients are computed using the chain rule
- Visualize how errors flow backward in a network
- See how gradient descent uses those gradients to update weights
- Use tf.GradientTape to track gradients
- Step through a simplified manual backpropagation demo
The Core Idea: Chain Rule¶
Backpropagation uses the chain rule from calculus to compute the gradient of the loss function with respect to each parameter in the network.
In a simple neural net:
Input x → [W1, b1] → hidden → [W2, b2] → output → loss
We want to know:
∂Loss/∂W1, ∂Loss/∂b1, ∂Loss/∂W2, ∂Loss/∂b2
tf.GradientTape: TensorFlow’s Engine¶
with tf.GradientTape() as tape:
logits = model(x_batch)
loss = loss_fn(y_batch, logits)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
- When tape.gradient() is called, TensorFlow traces those operations backward using the chain rule.
Manual Backpropagation: A Tiny Example¶
Let’s build a single-layer model manually and compute gradients ourselves.
- Define Inputs and Parameters
x = tf.constant([[1.0, 2.0]])
y_true = tf.constant([[1.0]])
W = tf.Variable([[0.1], [0.2]])
b = tf.Variable([0.3])
- Forward Pass and Loss
def forward(x):
return tf.matmul(x, W) + b
def mse(y_pred, y_true):
return tf.reduce_mean(tf.square(y_pred - y_true))
- Compute Gradients
with tf.GradientTape() as tape:
y_pred = forward(x)
loss = mse(y_pred, y_true)
grads = tape.gradient(loss, [W, b])
- Apply Gradients
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
optimizer.apply_gradients(zip(grads, [W, b]))
Gradient Descent: The Learning Step¶
At every training step, gradient descent does this:
new_weight = old_weight - learning_rate * gradient
Intuition: Why Does This Work?¶
Imagine trying to descend a mountain blindfolded, feeling the slope with your feet. Gradient descent gives you the direction (steepest descent) and a step size. Backpropagation tells you how each step affects your overall position (loss).
Together, they let the network learn even in high-dimensional, abstract spaces.
Summary¶
In this chapter, we:
- Demystified backpropagation using the chain rule
- Used
tf.GradientTape
to compute gradients automatically - Performed a step-by-step manual backpropagation
- Understood how gradient descent updates weights toward lower loss
Backpropagation isn’t just a technique—it’s the soul of deep learning. Mastering it gives you power to customize and debug any neural architecture.