Paper is out; Towards Playlist Generation Algorithms Using RNNs Trained on Within-Track Transitions

on arXiv (or pdf directly) today. The paper will be presented in SOAP 2016Workshop on Surprise, Opposition, and Obstruction in Adaptive and Personalized Systems (SOAP), which is in conjunction with UMAP 2016, 24th Conference on User Modeling, Adaptation and Personalization.


We introduce a novel playlist generation algorithm that focuses on the quality of transitions using a recurrent neural network (RNN). The proposed model assumes that optimal transitions between tracks can be modelled and predicted by internal transitions within music tracks. We introduce modelling sequences of high-level music descriptors using RNNs and discuss an experiment involving different similarity functions, where the sequences are provided by a musical structural analysis algorithm. Qualitative observations show that the proposed approach can effectively model transitions of music tracks in playlists.


  • Because I would like to build a system that creates playlists with good transitions
    • (where ‘good’ is bit ambiguous though),
  • I would like to use audio content
    • but existing datasets usually do not come with audio
    • then,
  • Then, how about using internal transitions as if they are track transitions?
    • so that I can use all my mp3…


To get playlists that have both consistency and fluctuations (it means in this paper good means proper consistency and fluctuations, or (almost) equivalently, similarity and serendipity.)

Proposed system


  1. For file in the training set, get segments of the song (I used MSAF/Foote 2000) and extract features from each segment (I used my ConvNet)
  2. Train RNNs with the sequences of features (of each segment) (keras is used)
  3. When seed song is given
    1. get segments of it
    2. extract features of the segments
    3. predict a vector using the feature sequence of seed song
    4. feature sequence can be concatenated with following songs
  4. Find a song that is most similar to the predicted feature vector
    1. I compared between (predicted feature vector, the feature vectors of the first segment of the songs)
      1. but we can think of other strategies e.g. average of the feature vectors
    2. I tested l2-norm, cosine distance, and DCG. DCG worked well, probably the best. Surprisingly cosine distance seems worse than l2. I’m trying to understand why.

Experiment, result, discussion

This is the weakest part of my research. I simply couldn’t evaluate it properly. I’ve listened by myself but I am super biased. Well, but we can see the feature vectors visualisation.


Strong blue (vertical) lines mean consistent, positive features. Strong read lines mean consistent, negative features. These lines seem related with consistency. Kind of blinking, red and blue lines mean fluctuating features and they seem related with fluctuation.

More details on on arXiv | pdf.

Towards Playlist Generation Algorithms using RNNs Trained on Within-Track Transitions
Keunwoo Choi, George Fazekas, Mark Sandler,
SOAP Workshop (Workshop on Surprise, Opposition, and Obstruction in Adaptive and Personalized Systems), Halifax, NS, Canada, 2016


One thought on “Paper is out; Towards Playlist Generation Algorithms Using RNNs Trained on Within-Track Transitions

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s