Skip to content

Appendix B: PyTorch Idioms and Gotchas

“Read this before your model does something stupid.”

Idioms

Goal Idiomatic PyTorch Code
Move everything to GPU x = x.to(device); model = model.to(device)
Get model predictions with torch.no_grad(): output = model(x)
Detach and convert to NumPy x.detach().cpu().numpy()
One-hot encode F.one_hot(t.long(), num_classes).float()
Check for NaNs torch.isnan(x).any()
Log training stats writer.add_scalar('loss', val, step)
Save model torch.save(model.state_dict(), 'model.pt')
Load model model.load_state_dict(torch.load(...))

⚠️ Gotchas

Gotcha What Goes Wrong Fix
Using .data for detaching Breaks autograd silently Use .detach()
Calling .numpy() on CUDA tensor Immediate crash Move to CPU first
In-place ops like +=, add_() May break gradients Use regular ops (x = x + y)
Forgetting .train() / .eval() BatchNorm/Dropout misbehave Switch mode explicitly
Wrong loss input types Float tensor vs. int labels Match dtypes (.float())
Not zeroing .grad before .backward() Gradients accumulate Call optimizer.zero_grad()