I downloaded GTZAN Music genre dataset from http://marsyasweb.appspot.com/download/data_sets/?_sm_au_=i7HSSSWqdVMd13T7.
I converted the GTZAN dataset from 22050hz to 16000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r16000 out.wav)
When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
What am I doing wrong? (Using CRNN with Theano)
I downloaded GTZAN Music genre dataset from http://marsyasweb.appspot.com/download/data_sets/?_sm_au_=i7HSSSWqdVMd13T7.
I converted the GTZAN dataset from 22050hz to 12000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r12000 out.wav)
When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
What am I doing wrong?
So you recommend me to build a new trained model with my training data and then test it against GTZAN dataset.
Why the uploaded pretrained weights are giving wrong results with GTZAN dataset.
Thanks
Srinidhi
Yes, I tested it with a similar network. It will get you 70-80% of accuracy. It is not the problem of gtzan. The current CRNN weights are kinda weird, it makes sense with AUC evaluation scheme though. (AUC is not about top-K prediction.) I’m planning to update it.
Hi! I don’t see how the pre-trained One Million Song model was trained? Thanks
LikeLike
Hi, what do you mean by ‘how’?
LikeLike
In a previous post it said that the model was trained on ~29s audio from the OMS dataset. Is that correct and if so, I couldn’t find the code? Thanks!
LikeLike
Training is not the part of code. I used MSD dataset.
LikeLike
Ah, ok. That is what I was wondering ie. how did you train the model using the MSD dataset.
LikeLike
Please elaborate more? I still can’t get the point of your question.
LikeLike
Will send email otherwise will go round and round.
LikeLike
Hi,
I downloaded GTZAN Music genre dataset from http://marsyasweb.appspot.com/download/data_sets/?_sm_au_=i7HSSSWqdVMd13T7.
I converted the GTZAN dataset from 22050hz to 16000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r16000 out.wav)
When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
What am I doing wrong? (Using CRNN with Theano)
regards
Srinidhi
LikeLike
Hi,
I downloaded GTZAN Music genre dataset from http://marsyasweb.appspot.com/download/data_sets/?_sm_au_=i7HSSSWqdVMd13T7.
I converted the GTZAN dataset from 22050hz to 12000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r12000 out.wav)
When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
What am I doing wrong?
regards
Srinidhi
LikeLike
I’d recommend you to use it as a feature extractor and add a classifier on the top of it, rather than use the result as it is.
LikeLike
So you recommend me to build a new trained model with my training data and then test it against GTZAN dataset.
Why the uploaded pretrained weights are giving wrong results with GTZAN dataset.
Thanks
Srinidhi
LikeLike
Yes, I tested it with a similar network. It will get you 70-80% of accuracy. It is not the problem of gtzan. The current CRNN weights are kinda weird, it makes sense with AUC evaluation scheme though. (AUC is not about top-K prediction.) I’m planning to update it.
LikeLike