Most common neural network mistakes
@opendatascience- You didn't try to overfit a single batch first
- You forgot to toggle train/eval mode for the net
- You forgot to .zero_grad() (in pytorch) before .backward()
- You passed softmaxed outputs to a loss that expects raw logits
- You didn't use `bias=False` for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer .This one won't make you silently fail, but they are spurious parameters
- Thinking view() and permute() are the same thing (& incorrectly using view)
- You forgot that pytorch's .view() function reads from the last dimension first and fills the last dimension first too and are sending wrong input to model but aren't getting an error since the shape is right
- Not shuffling training data, or otherwise using batches that have too much correlation between the examples in each batch
- Thinking embeddings is only for NLP tasks and not using them in general for categorical input variables
- You forgot converting to float() after a comparison of tensors and summing on ByteTensors that have a buffer of 255 and zero out after that (should be fixed in new pytorch).
- Not double-checking the learning rate --> an initial learning rate that is (far) too high leading to "weird" results.
- Bad image augmentation --> I've accidentally augmented (w/a minor zoom in a loop) data loaded in memory, not a copy of the data, leading to ~useless data
- Softmax or other loss operation over wrong dim
- Wrong sign for loss term
- Forgetting to pass hidden state from encoder to decoder
- Forgetting to clip gradients, especially for RNNs. All learned the hard way
Source: Twitter thread