— This post will be updated when I come across a new trick.
- Data augmentation: add noise to data, for images: simulate different view points, camera models, such as using affine transformation, lens distortion, color correction, adjust contrast, gamma, saturation, etc.
- Data normalization: center data, normalize to [-1, 1], PCA, PCA-whitening, represent angels to their cos and sin.
- Initialization: small random weights, to balance the outputs from the neurons: w=np.random.randn(n)*sqrt(2/n). n is the number of inputs.
- Regularization: L2, L1, L1+L2, max norm constraints, dropout, adjust batch size (if the loss curve is too noisy, increase the batch size). If the training accuracy is good, but the validation accuracy is terrible, overfit. increase regularization. If they are pretty close, maybe we could make the model more complicated (add more layers, neurons, etc). For CNN, the visualization of the weights could tell. If the weights are too noisy (not or hard to see clear patterns), increase regularization.
- Perform batch normalization
- Gradient clipping: would work for RNN.
- Learning rate: divide the gradients by minibatch size, start with a typical rate 0.1. If stop converging on validation set, divide rate by 2.
- Actuation functions: use tanh, ReLU, leaky ReLU, or others. Sigmoid is not very useful.
- Ensemble: Getting multiple DNNs. They could be from different initials. Then use validation data to do cross validation. pick good ones. Use them together. If multiple trainings are not possible (too slow), use the trained DNNs after several epochs apart.