Appendix B: PyTorch Idioms and Gotchas
“Read this before your model does something stupid.”
Idioms
| Goal |
Idiomatic PyTorch Code |
| Move everything to GPU |
x = x.to(device); model = model.to(device) |
| Get model predictions |
with torch.no_grad(): output = model(x) |
| Detach and convert to NumPy |
x.detach().cpu().numpy() |
| One-hot encode |
F.one_hot(t.long(), num_classes).float() |
| Check for NaNs |
torch.isnan(x).any() |
| Log training stats |
writer.add_scalar('loss', val, step) |
| Save model |
torch.save(model.state_dict(), 'model.pt') |
| Load model |
model.load_state_dict(torch.load(...)) |
⚠️ Gotchas
| Gotcha |
What Goes Wrong |
Fix |
Using .data for detaching |
Breaks autograd silently |
Use .detach() |
Calling .numpy() on CUDA tensor |
Immediate crash |
Move to CPU first |
In-place ops like +=, add_() |
May break gradients |
Use regular ops (x = x + y) |
Forgetting .train() / .eval() |
BatchNorm/Dropout misbehave |
Switch mode explicitly |
| Wrong loss input types |
Float tensor vs. int labels |
Match dtypes (.float()) |
Not zeroing .grad before .backward() |
Gradients accumulate |
Call optimizer.zero_grad() |