In my 2016 ISMIR paper(Paper is out: Automatic tagging using deep convolutional neural networks), I applied deep convolutional networks for music tagging. It’s been 10 months since I wrote the paper and I realised many mistakes or not-very-best design choices, which I just felt like to share. (Yes, I am writing a report and it’s long and boring.)
- Number of feature maps
2048 for the first-to-last layers. It is absolutely too many. No doubt. It is redundant when the number of output nodes is only 50. (Even for Imagenet, which has 1000 output nodes, 2048 could be too much).
32 feature maps in all 5 layers, I got a similar performance. To be safe,
64 would be fine. But not
- Split setting
I released the split setting (release; Million Song Dataset split setting that I used) for reproducing the experiment – which I believe is good. What’s not cool about it is that it wouldn’t be the best split.
..until then let’s use the same setting, at least we got reproducability 😉
- Dropout? Batch normalization? – too much noise will kill you.
Dropout still helps to convnet, but not as critical as it used to be (read these), and I think
0.5 was too large. At the end of training, it only makes it hard to decide when to stop. These days I’m relying on batch normalization + early stop, seems like it’s more stable.
- Could have used zero-padding
..so that none of the layers doesn’t discard the information. The pooling schemes that I used sometimes discard some edges. It wouldn’t be much critical though.
- It is not fully convolutional indeed
because, in essence, the output layer is fully-connected to the last
Nx1x1 feature map. Using average-pooling with real fully-convolutional setting should work, I haven’t tried with msd/tagging though.
- A bug
I happend to used log (power-power-melspectrogram) with 80-dB dynamic range limitation. Which made it a 40-dB dynamic range input representation.
- FCN-6 and FCN-7 are pointless
They are. I just wanted to add more experiments..
That’s it. Please note these problems if you’re more than reading the paper!