This means you can easily build the model and load pre-trained weights as below.
Which is the better feature extractor?
include_top=False, you can get 256-dim (
MusicTaggerCNN) or 32-dim (
MusicTaggerCRNN) feature representation.
In general, I would recommend to use
MusicTaggerCRNN and 32-dim feature as for predicting 50 tags, 256 features actually sound bit too large. I haven’t looked into 256-dim feature but only 32-dim features. I thought of using PCA to reduce the dimension more, but ended up not applying it because
mean(abs(recovered - original) / original) are
.05 (dim: 32->24) – which don’t seem good enough.
Probably the 256-dim features are redundant (which then you can reduce them down effectively with PCA), or they just include more information than 32-dim ones (e.g., features in different hierarchical levels). If the dimension size would not matter, it’s worth choosing 256-dim ones.
include_top=False and get a feature extractor that outputs the second last node activation of the network.