Skip to content

Chapter 12: torch.nn.functional and Activation Math

“When you don’t need a layer, you just need a function.”


12.1 What is torch.nn.functional?

While torch.nn contains layer classes like nn.ReLU, nn.Linear, etc., the torch.nn.functional module gives you stateless functional versions.

Type Example from torch.nn.functional
Activation functions F.relu, F.sigmoid, F.softmax
Loss functions F.cross_entropy, F.mse_loss
Convolutional ops F.conv2d, F.max_pool2d
Normalization F.batch_norm, F.layer_norm
Utility transforms F.pad, F.interpolate, F.one_hot

Stateless means you must pass all arguments explicitly — no hidden weights or buffers.


12.2 Common Activation Functions

import torch.nn.functional as F
x = torch.tensor([-1.0, 0.0, 1.0])

➤ ReLU

F.relu(x)  # tensor([0., 0., 1.])

➤ Sigmoid

F.sigmoid(x)  # tensor([0.2689, 0.5000, 0.7311])

➤ Tanh

F.tanh(x)  # tensor([-0.7616,  0.0000,  0.7616])

➤ Softmax

logits = torch.tensor([2.0, 1.0, 0.1])
F.softmax(logits, dim=0)  # Probabilities that sum to 1

✅ Always specify dim with softmax.


12.3 Loss Functions in Functional

Loss functions in F behave like their nn counterparts — but are stateless.

➤ Cross-Entropy Loss

logits = torch.tensor([[2.0, 1.0, 0.1]])
targets = torch.tensor([0])
loss = F.cross_entropy(logits, targets)

F.cross_entropy() = log_softmax + nll_loss internally. So pass raw logits, not softmaxed outputs.

➤ MSE Loss

pred = torch.tensor([0.5, 0.7])
target = torch.tensor([1.0, 0.0])
F.mse_loss(pred, target)

➤ Binary Cross-Entropy

F.binary_cross_entropy(torch.sigmoid(pred), target)

12.4 functional vs nn.Module

Situation Use
You need modularity nn.ReLU(), nn.Linear()
You want fine-grained control F.relu(), F.linear()
Inside forward() method PreferF.* for functions
Initializing outside model Use nn.Module versions

Example:

# With Module
self.relu = nn.ReLU()
x = self.relu(x)

# With Functional
import torch.nn.functional as F
x = F.relu(x)

Inside forward(), many developers prefer F.* to keep the model class minimal and explicit.

12.5 Functional Layers (Linear, Conv, etc.)

Functional Linear Layer:

weight = torch.randn(10, 5)
bias = torch.randn(10)
x = torch.randn(1, 5)
F.linear(x, weight, bias)

You're responsible for managing parameters manually. Useful for: - Writing custom layers - Doing meta-learning - Implementing custom architectures


12.6 One-Hot Encoding

labels = torch.tensor([0, 2])
F.one_hot(labels, num_classes=3).float()

Perfect for manual cross-entropy implementations or label smoothing.

12.7 Other Useful Functions

➤ Padding:

F.pad(x, pad=(1, 1), mode='constant', value=0)

➤ Upsampling / Interpolation:

F.interpolate(image, scale_factor=2, mode='bilinear')

12.8 Caution: Don’t Mix Modules with Functionals Blindly

Mixing nn.CrossEntropyLoss() with F.softmax()? ❌ Bad idea.

nn.CrossEntropyLoss() expects raw logits. Passing softmaxed values will double-softmax your output and lead to training instability.


✅ 12.9 Summary

Category Functional API Examples
Activations F.relu, F.softmax, F.tanh
Loss Functions F.cross_entropy, F.mse_loss
Layer Ops F.linear, F.conv2d, F.pad
  • torch.nn.functional is stateless and explicit.
  • Great for flexibility, custom layers, or experimental architectures.
  • Use with care — you must manage shapes, devices, and parameters manually.