My ISMIR 2021 Submission and its reviews

As I did 2 years ago on my DrummerNet paper, I’m open-sourcing the reviews my submission received. I did it back then, and I’m doing it again now, since when I had no paper in ISMIR, I was always very curious about how ISMIR review is being done. By not making this information available, I…More

slightly better research code – avoid hard-coded values

Imagine you need to crop the first 10 second of a waveform. This can be improved by like this. Of course it does the same thing. But this is better because.. Now you know the meaning of this magic number 160000 . And this means that.. Now ANYONE would know the meaning of 160000. Because…More

Some choices I’ve made and why

Only occasionally though, I’ve been asked those classic questions like “So how did you start your career?”, “What motivated you to start a PhD course?”, etc., and somehow I ended up promising that I’ll write a post about it. So, here we go. Disclaimer: I’ll be only straightforward, simple, and dumb. Bachelor: EE My tutor…More

Q&A: How to transcribe rap songs

… I want to understand what they are rapping about … I want to ask if it is indeed possible to transcribe rap songs? I have vocals extracted from the songs and tried to use Google speech2text API for it but the results look very random and bad. I am given the impression that transcribing…More

ICASSP 2020 papers and summaries

Let me reuse my tweets 🙂 https://t.co/ZABBXEDS1c "Improving Universal Sound Separation Using Sound Classification". Used a pre-trained net to extract an embedding that conditions a separation model. Nice work! Turned out it's the same first author (@ETzinis) of the paper above. — Keunwoo Choi (@keunwoochoi) May 18, 2020 https://t.co/hgfCBMnRSU The structure of separate formant mask…More

ICLR 2020 – Invited talk by Michael Jordan

Invited talk by Michael Jordan Michael Jordan first took some time to talk about his idea about the next generation ML – which would be followed by existing ML applications in the real world. He calls it “Markets” – beyond backend (fraud detection, search), human side (RecSys, social media), pattern recognition (now — speech recognition,…More

Tensorflow – parse tfrecords, tf.io.VarLenFeature(tf.string), etc

Sometimes your labels might be something like [‘text1’, ‘test2′] for each example. Say, it’s an image dataset and there’s a label for multiple objects existing in an image. Creating dataset Tensorflow documentation shows that we’ll need to use tf.train.BytesList(). def _bytes_feature(values): values = [v for v in values if v is not None] # i…More

Once seemingly memory leakage with Tensorflow

I had this OOM problem which seemed pretty random before I compared the memory usage per second and Tensorflow log. The problem was not a memory leak. It died at the transition of evaluation and training. I’m quite sure this is happening because the dataloader from the previous set (e.g., training) is still there in…More