A food-for-thought as a music researcher from NIPS 2015 tutorial

NIPS’2015 Tutorial Geoff Hinton, Yoshua Bengio & Yann …

What are ConvNets good for

  • Signals that comes to you in the form of (multidimensional) arrays
  • Signals that have strong local correlations
  • Signals where features can appear anywhere
  • Signals in which objects are invariant to translations and distortion

So, does a spectrogram fit for these criteria?

  • Multidimensional arrays

Yeah.

  • Signals that have strong local correlations

Not always

  • Signals where features can appear anywhere

Yes, although perceptual features would appear in rather low- or mid-spectral bands.

  • Signals in which objects are invariant to translations and distortion

For high-level labels such as mood tags, somewhat true, somewhat false. For features like chord, no. (That’s why people use DNNs on CQT for chord recognition)

We need to think about other representation that satisfy these better.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s