Appendix B: PyTorch Idioms and Gotchas¶

“Read this before your model does something stupid.”

Idioms¶

Goal	Idiomatic PyTorch Code
Move everything to GPU	`x = x.to(device); model = model.to(device)`
Get model predictions	`with torch.no_grad(): output = model(x)`
Detach and convert to NumPy	`x.detach().cpu().numpy()`
One-hot encode	`F.one_hot(t.long(), num_classes).float()`
Check for NaNs	`torch.isnan(x).any()`
Log training stats	`writer.add_scalar('loss', val, step)`
Save model	`torch.save(model.state_dict(), 'model.pt')`
Load model	`model.load_state_dict(torch.load(...))`

Gotcha	What Goes Wrong	Fix
Using `.data` for detaching	Breaks autograd silently	Use `.detach()`
Calling `.numpy()` on CUDA tensor	Immediate crash	Move to CPU first
In-place ops like `+=`, `add_()`	May break gradients	Use regular ops (`x = x + y`)
Forgetting `.train()` / `.eval()`	BatchNorm/Dropout misbehave	Switch mode explicitly
Wrong loss input types	Float tensor vs. int labels	Match dtypes (`.float()`)
Not zeroing `.grad` before `.backward()`	Gradients accumulate	Call `optimizer.zero_grad()`