Chapter 12: torch.nn.functional
and Activation Math¶
“When you don’t need a layer, you just need a function.”
12.1 What is torch.nn.functional
?¶
While torch.nn
contains layer classes like nn.ReLU
, nn.Linear
, etc., the torch.nn.functional
module gives you stateless functional versions.
Type | Example from torch.nn.functional |
---|---|
Activation functions | F.relu , F.sigmoid , F.softmax |
Loss functions | F.cross_entropy , F.mse_loss |
Convolutional ops | F.conv2d , F.max_pool2d |
Normalization | F.batch_norm , F.layer_norm |
Utility transforms | F.pad , F.interpolate , F.one_hot |
Stateless means you must pass all arguments explicitly — no hidden weights or buffers.
12.2 Common Activation Functions¶
import torch.nn.functional as F
x = torch.tensor([-1.0, 0.0, 1.0])
➤ ReLU¶
F.relu(x) # tensor([0., 0., 1.])
➤ Sigmoid¶
F.sigmoid(x) # tensor([0.2689, 0.5000, 0.7311])
➤ Tanh¶
F.tanh(x) # tensor([-0.7616, 0.0000, 0.7616])
➤ Softmax¶
logits = torch.tensor([2.0, 1.0, 0.1])
F.softmax(logits, dim=0) # Probabilities that sum to 1
✅ Always specify
dim
withsoftmax
.
12.3 Loss Functions in Functional¶
Loss functions in F behave like their nn counterparts — but are stateless.
➤ Cross-Entropy Loss¶
logits = torch.tensor([[2.0, 1.0, 0.1]])
targets = torch.tensor([0])
loss = F.cross_entropy(logits, targets)
F.cross_entropy() = log_softmax + nll_loss internally. So pass raw logits, not softmaxed outputs.
➤ MSE Loss¶
pred = torch.tensor([0.5, 0.7])
target = torch.tensor([1.0, 0.0])
F.mse_loss(pred, target)
➤ Binary Cross-Entropy¶
F.binary_cross_entropy(torch.sigmoid(pred), target)
12.4 functional
vs nn.Module
¶
Situation | Use |
---|---|
You need modularity | nn.ReLU() , nn.Linear() |
You want fine-grained control | F.relu() , F.linear() |
Inside forward() method |
PreferF.* for functions |
Initializing outside model | Use nn.Module versions |
Example:¶
# With Module
self.relu = nn.ReLU()
x = self.relu(x)
# With Functional
import torch.nn.functional as F
x = F.relu(x)
Inside
forward()
, many developers preferF.*
to keep the model class minimal and explicit.
12.5 Functional Layers (Linear, Conv, etc.)¶
Functional Linear Layer:¶
weight = torch.randn(10, 5)
bias = torch.randn(10)
x = torch.randn(1, 5)
F.linear(x, weight, bias)
You're responsible for managing parameters manually. Useful for: - Writing custom layers - Doing meta-learning - Implementing custom architectures
12.6 One-Hot Encoding¶
labels = torch.tensor([0, 2])
F.one_hot(labels, num_classes=3).float()
Perfect for manual cross-entropy implementations or label smoothing.
12.7 Other Useful Functions¶
➤ Padding:¶
F.pad(x, pad=(1, 1), mode='constant', value=0)
➤ Upsampling / Interpolation:¶
F.interpolate(image, scale_factor=2, mode='bilinear')
12.8 Caution: Don’t Mix Modules with Functionals Blindly¶
Mixing nn.CrossEntropyLoss()
with F.softmax()
? ❌ Bad idea.
nn.CrossEntropyLoss()
expects raw logits. Passing softmaxed values will double-softmax your output and lead to training instability.
✅ 12.9 Summary¶
Category | Functional API Examples |
---|---|
Activations | F.relu, F.softmax, F.tanh |
Loss Functions | F.cross_entropy, F.mse_loss |
Layer Ops | F.linear, F.conv2d, F.pad |
- torch.nn.functional is stateless and explicit.
- Great for flexibility, custom layers, or experimental architectures.
- Use with care — you must manage shapes, devices, and parameters manually.