pat; PsychoAcoustics Toolbox

1. Quick Intro

I just started to develop this toolbox, because I don’t like the idea of using spectrogram without any perceptual weighting in machine learning/deep learning/MIR.

The name is pretty ambitious. I hope to fill the box with valuable things!

Now it helps you to compute precise weights for energy → loudness conversion. It can be applied to a decible-scale (or log (magnitude**2) ) of any time-frequency representation.

2. Comparison


This is the before/after of perceptual weighting. You can see the energy reduction in low-frequency bands while few kHz ranges are boosted. This is more correct representation of what we really hear. In other words, just imagine you put a pinna and ear-canal as a pre-processing of the data.

My code also includes A,B,C,D weightings, which is much more simple way to do the same.


3. Equal Loudness Contour

Then what is different from those {A,B,C,D}-weightings and my perceptual weighting?

The human auditory system is extremely nonlinear. Therefore, the compensation should consider not only the frequency (as those simple weightings), but also the energy level of the TF-bin.


(Image from Wikipedia)

That is the reason we need many contours – because only one contour can’t cover the whole range!

Please read the Wikipedia, and if you’re into the details of the ISO 223 and how they are approximated, please read this paper!

4. The name, pat

You may say the name is bit too ambitious?

Yes it’s true… let’s see what’s going on here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s