There are many small reasons for this. Well, having my own python package that can be installed with pip command is cool, right? It’s one of the cool kids’ requirements. I’ve learned a lot by doing it. But, most importantly, it makes the user experience so nice to do research with more than a bunch of audio files.
Let’s say you’re doing some classification with not-so-small music dataset. Like FMA-medium which is roughly 30GB after uncompressing.
Ok, decoding mp3 takes long so it should be done before.
But actually to what should I convert them? Spectrogram? Melspectrogram? With what parameter? Alright, I’ll try
hop=512, 256,… hm.. and..
Congrats, just do it and store them which would take up like 2 * 2 * 2 * 30 = 240GB. (Yeah it’s very loose estimation.)
A day or two later.
Okay they are ready! Now let’s run the model.
Result: [512, 256, 96] worked the best.
Wut, then is 96 better? or maybe I gotta try with 72? 48?
Ok then I can remove the others… should I? But it took me a day and what if I need it again later..
👏👏👏 YES IT GOES ON AND ON! What if you can just do this?
model = Sequential() # A mel-spectrogram layer model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=input_shape, border_mode='same', sr=sr, n_mels=128, fmin=0.0, fmax=sr/2, power=1.0, return_decibel_melgram=False, trainable_fb=False, trainable_kernel=False, name='trainable_stft'))
Hey hey, but it would add more computation on real time. How much is it?
for 16384 audio samples (30 seconds each), and the convnet looks like this. (157k parameters, 5-layer)
- The experiment uses a dummy data arrays which has no overhead to feed.
- We tested it on four gpu models and the charts above is normalised. But what really matters is the absolute time.
- If your model is very simple, the proportion of time that Kapre uses would increases.
- Vice versa,
- which is good because if you got a large dataset → pre-computing and storing spectrograms are annoying → with large dataset you probably would like to use large network to take advantage of the dataset → yeah, so among the overall training time, the overhead that Kapre introduce becomes trivial.