is a wonderful venue (with good coffee!), and the city is a lot of fun. Everything went smoothly and the
organizers did a great job.
You can get a list of papers that I liked from my Twitter feed, so instead I’d like to discuss some broad themes
- Multitask regularization to mitigate sample complexity in RL. Both in video games and in dialog, it is useful to add extra (auxiliary) tasks in order to accelerate learning.
- Leveraging knowledge and memory. Our current models are powerful function approximators, but in NLP especially we need to go beyond “the current example” in order exhibit competence.
- Gradient descent as inference. Whether it’s inpainting with a GAN or BLUE score maximization with an RNN, gradient descent is an unreasonably good inference algorithm.
- Careful initialization is important. I suppose traditional optimization people would say “of course”, but we’re starting to appreciate the importance of good initialization for deep learning. In particular, start close to linear with eigenvalues close to 1. (Balduzzi et. al. , Poole et. al.)
- Convolutions are as good as, and faster than, recurrent models for NLP. Nice work out of Facebook on causal convolutions for seq2seq. This aligns with my personal experience: we use convolutional NLP models in production for computational performance reasons.
- Neural networks are overparameterized. They can be made much sparser without losing accuracy (Molchanov et. al., Lobacheva et. al.).
- maluuba had the best party. Woot!
Finally, I kept thinking the papers are all “old”. While there were lots of papers I was seeing for the first time, it nonetheless felt like the results were all dated because I’ve become addicted to “fresh results” on arxiv.