Release; pre-trained convnet for music auto-tagging


12 thoughts on “Release; pre-trained convnet for music auto-tagging

      1. In a previous post it said that the model was trained on ~29s audio from the OMS dataset. Is that correct and if so, I couldn’t find the code? Thanks!


  1. Hi,

    I downloaded GTZAN Music genre dataset from
    I converted the GTZAN dataset from 22050hz to 16000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r16000 out.wav)
    When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
    What am I doing wrong? (Using CRNN with Theano)



    1. I’d recommend you to use it as a feature extractor and add a classifier on the top of it, rather than use the result as it is.


    2. So you recommend me to build a new trained model with my training data and then test it against GTZAN dataset.
      Why the uploaded pretrained weights are giving wrong results with GTZAN dataset.


  2. Yes, I tested it with a similar network. It will get you 70-80% of accuracy. It is not the problem of gtzan. The current CRNN weights are kinda weird, it makes sense with AUC evaluation scheme though. (AUC is not about top-K prediction.) I’m planning to update it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s