Release; pre-trained convnet for music auto-tagging

Aug 06 2016February 10, 2017

Published by keunwoochoi

View all posts by keunwoochoi

12 Comments

Dinesh Vadhia says:

January 9, 2017 at 11:17 am

Hi! I don’t see how the pre-trained One Million Song model was trained? Thanks

LikeLike

Reply
1. keunwoochoi says:
  
  January 9, 2017 at 3:32 pm
  
  Hi, what do you mean by ‘how’?
  
  LikeLike
  
  Reply
  1. Dinesh Vadhia says:
    
    January 9, 2017 at 4:40 pm
    
    In a previous post it said that the model was trained on ~29s audio from the OMS dataset. Is that correct and if so, I couldn’t find the code? Thanks!
    
    LikeLike
  2. keunwoochoi says:
    
    January 10, 2017 at 9:45 am
    
    Training is not the part of code. I used MSD dataset.
    
    LikeLike
  3. Dinesh Vadhia says:
    
    January 10, 2017 at 10:17 am
    
    Ah, ok. That is what I was wondering ie. how did you train the model using the MSD dataset.
    
    LikeLike
  4. keunwoochoi says:
    
    January 10, 2017 at 10:22 am
    
    Please elaborate more? I still can’t get the point of your question.
    
    LikeLike
  5. Dinesh Vadhia says:
    
    January 10, 2017 at 1:46 pm
    
    Will send email otherwise will go round and round.
    
    LikeLike
K R Srinidhi says:

January 11, 2017 at 7:25 am

Hi,

I downloaded GTZAN Music genre dataset from http://marsyasweb.appspot.com/download/data_sets/?_sm_au_=i7HSSSWqdVMd13T7.
I converted the GTZAN dataset from 22050hz to 16000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r16000 out.wav)
When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
What am I doing wrong? (Using CRNN with Theano)

regards
Srinidhi

LikeLike

Reply
K R Srinidhi says:

January 11, 2017 at 9:32 am

Hi,

I downloaded GTZAN Music genre dataset from http://marsyasweb.appspot.com/download/data_sets/?_sm_au_=i7HSSSWqdVMd13T7.
I converted the GTZAN dataset from 22050hz to 12000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r12000 out.wav)
When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
What am I doing wrong?

regards
Srinidhi

LikeLike

Reply
1. keunwoochoi says:
  
  January 11, 2017 at 2:28 pm
  
  I’d recommend you to use it as a feature extractor and add a classifier on the top of it, rather than use the result as it is.
  
  LikeLike
  
  Reply
2. K R Srinidhi says:
  
  January 11, 2017 at 4:41 pm
  
  So you recommend me to build a new trained model with my training data and then test it against GTZAN dataset.
  Why the uploaded pretrained weights are giving wrong results with GTZAN dataset.
  Thanks
  Srinidhi
  
  LikeLike
  
  Reply
keunwoochoi says:

January 11, 2017 at 4:45 pm

Yes, I tested it with a similar network. It will get you 70-80% of accuracy. It is not the problem of gtzan. The current CRNN weights are kinda weird, it makes sense with AUC evaluation scheme though. (AUC is not about top-K prediction.) I’m planning to update it.

LikeLike

Reply

Share this:

Related

Published by keunwoochoi

12 Comments

Leave a Comment Cancel reply