Chapter 14: Building a Neural Network from Scratch¶

“Before you rely on magic, understand the machinery beneath it.”

In this chapter, we'll strip away the abstraction of high-level APIs and dive into the inner mechanics of building a neural network step-by-step using only low-level TensorFlow operations (tf.Variable, tf.matmul, tf.nn, etc.). This exercise gives you a deeper appreciation of what libraries like tf.keras automate for us—and how neural networks actually operate under the hood.

By the end of this chapter, you’ll be able to:

Initialize weights and biases manually
Write your own forward pass function
Calculate loss and accuracy
Implement backpropagation using tf.GradientTape
Train a minimal network on a real dataset (e.g., MNIST)

Step 1: Dataset Preparation¶

We’ll use the MNIST dataset (handwritten digits) for simplicity. It's preloaded in TensorFlow:

import tensorflow as tf

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize and flatten
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train.reshape(-1, 784)
x_test = x_test.reshape(-1, 784)

# Convert to tf.Tensor
x_train = tf.convert_to_tensor(x_train, dtype=tf.float32)
y_train = tf.convert_to_tensor(y_train, dtype=tf.int64)
x_test = tf.convert_to_tensor(x_test, dtype=tf.float32)
y_test = tf.convert_to_tensor(y_test, dtype=tf.int64)

Step 2: Model Initialization¶

We'll define a simple feedforward neural network with:

Input layer: 784 units (28x28 pixels)
Hidden layer: 128 units + ReLU
Output layer: 10 units (one per digit)

# Parameters
input_size = 784
hidden_size = 128
output_size = 10

# Weights and biases
W1 = tf.Variable(tf.random.normal([input_size, hidden_size], stddev=0.1))
b1 = tf.Variable(tf.zeros([hidden_size]))
W2 = tf.Variable(tf.random.normal([hidden_size, output_size], stddev=0.1))
b2 = tf.Variable(tf.zeros([output_size]))

Step 3: Forward Pass Function¶

def forward_pass(x):
    hidden = tf.nn.relu(tf.matmul(x, W1) + b1)
    logits = tf.matmul(hidden, W2) + b2
    return logits

Step 4: Loss & Accuracy¶

Use sparse categorical cross-entropy since labels are integer-encoded:

def compute_loss(logits, labels):
    return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels))

def compute_accuracy(logits, labels):
    preds = tf.argmax(logits, axis=1, output_type=tf.int64)
    return tf.reduce_mean(tf.cast(tf.equal(preds, labels), tf.float32))

Step 5: Training Loop¶

Now we manually implement the training loop using tf.GradientTape.

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
epochs = 5
batch_size = 64

for epoch in range(epochs):
    for i in range(0, len(x_train), batch_size):
        x_batch = x_train[i:i+batch_size]
        y_batch = y_train[i:i+batch_size]

        with tf.GradientTape() as tape:
            logits = forward_pass(x_batch)
            loss = compute_loss(logits, y_batch)

        gradients = tape.gradient(loss, [W1, b1, W2, b2])
        optimizer.apply_gradients(zip(gradients, [W1, b1, W2, b2]))

    # Epoch-end evaluation
    test_logits = forward_pass(x_test)
    test_acc = compute_accuracy(test_logits, y_test)
    print(f"Epoch {epoch+1}, Test Accuracy: {test_acc:.4f}")

Summary¶

In this chapter, we:

Built a fully functioning neural network without tf.keras
Initialized all parameters manually
Defined forward propagation, loss, and backpropagation
Trained it on MNIST using gradient descent

Understanding how to manually construct and train a neural network builds foundational intuition that will help you:

Debug custom layers and losses
Understand performance bottlenecks
Transition into low-level model tweaking when needed