Update 20 Apr 2016: Check out the paper on arXiv (PDF)

Summary

Word-RNN (LSTM) on Keras with wordified text representations of Metallica’s drumming midi files, which came from midiatabase.com.

Midi files of Metallica track comes from midiatabase.com.
LSTM model comes from Keras.
Read Midi files with python-midi.
Convert them to a text file (corpus) by my rules, which are
- (Temporal) Quantisation
- Simplification/Omitting some notes
- ‘Word’ with binary numbers
Learn an LSTM model with the corpus and generate by prediction of words.
Words in a text file → midi according to the rules I used above.
Listen!

A Quick look on things (copied-and-pasted from my previous post)

LSTM

LSTM (Long Short-Term Memory) is a type of RNN. It is known to be able to learn a sequence effectively.

RNN

RNN (Recurrent Neural Network) is a type of deep learning neural network. See this post by WildML for further understanding.

Keras

Keras is a deep learning framework based on Theano and Tensorflow. I used Theano as backend but this shouldn’t affect the output.

Another quick look on things

Metallica

Metallica is an American heavy metal band formed in Los Angeles, California. Metallica was formed in 1981 when vocalist/guitarist James Hetfield responded to an advertisement posted by drummer Lars Ulrich in a local newspaper. Wikipedia

Why Metallica?

Because I found quite enough number of midi track of Metallica – more than any other artists so far. Also the drum tracks in Metallica are relatively consistent. The simplification in the preprocess would also make sense for Metallica drum tracks according to…… my brain.

Preprocess – How to Model a rhythm representation into a text?

LSTM models are basically about time-series modelling, i.e. 1-D data. In my previous work, it was straightforward how to model a chord progression as a text (so that I didn’t even mentioned about ‘modelling’). I converted the midi tracks into a symbolic, 1-D data to deal with it as a text.

Drum track is not a 1-D

No it isn’t as you see above. This is so called a piano-roll view. Y-axis is pitch, x-axis is time, and each note represents different part of drum. Here, blue:kick, green:snare, yellow-or-olive-or-whatever:(opened) hi-hats, and red:crash cymbals. Yes, this is a piano-roll view of the drum track of Master of Puppet – from 0:28 in this live:

It is not 1-D as drummers are using their arms and legs simultaneously.

First, Quantisation and Simplification

Quantisation is to put the notes at certain timings, and only at certain timings. Nice explanation from mididrumfiles.com. It’s just a rounding function in time axis. So I quantised the midi files by 16th notes, assuming Lars Ulrich is not playing otherwise – it introduces some errors, especially there are triplets.

Further simplification I did is to limit the types of notes: a kick, a snare, open hi-hats, closed hi-hats, three tom-toms, a crash, and a ride: 9 notes in total. Using the General-MIDI drum map it is expressed as below:

allowed_pitch = [36, 38, 42, 46, 41, 45, 48, 51, 49] # 46: open HH

drum_conversion = {35:36, # acoustic bass drum -&amp;gt; bass drum (36)
37:38, 40:38, # 37:side stick, 38: acou snare, 40: electric snare
43:41, # 41 low floor tom, 43 ghigh floor tom
47:45, # 45 low tom, 47 low-mid tom
50:48, # 50 high tom, 48 hi mid tom
44:42, # 42 closed HH, 44 pedal HH
57:49, # 57 Crash 2, 49 Crash 1
59:51, 53:51, 55:51, # 59 Ride 2, 51 Ride 1, 53 Ride bell, 55 Splash
52:49 # 52: China cymbal
}

My encoding scheme (midi→text)

The basic idea is to represent the information of all True or False (played or not played) of every notes at a time with a single word. The words are rather self-explanatory:

‘000000000’ : nothing played
‘100000000’ : kick is played
‘1000000001’ : kick and crash played
‘0101000000’ : snare and open-HH played

Obviously 1 is True and 0 is False, at their dedicated places with a rule of:

with ‘Bar’ added for the segmentation of measures.

Corpus looks like this:

0b010000000 0b010000000 0b000000000 0b010000000 0b010000000 0b000001000 0b000000000 0b000001000 0b010000000 0b010000000 0b000000000 0b010000000 0b010000000 0b000001000 0b000000000 0b000001000 BAR 0b010000000 0b010000000 0b000000000 0b010000000 0b010000000 0b000001000 0b000000000 0b000001000 0b010000000 0b000000000 0b000000000 0b000001000 0b000000000 0b000001000 0b000001000 0b000000000 BAR 0b100000001 0b000000000 0b000000000 0b000000000 0b010000001 0b000000000 0b000000000 0b000000000 0b100000001 0b000000000 0b000000000 0b000000000 0b010000001 0b000000000 0b000000000 0b000000000 BAR …

where a prefix 0b is added to specify it’s kind of a binary number.

LSTM Structure

I applied word-RNN here. Some numbers are..

60 songs for training data
Number of words: 2,141,692 (including ‘BAR’ in every 16 words)
Total number of words: 119
- 119 out of 2**9==512 possibilities.

The code is the same as I used for LSTM Realbook:

model = Sequential()
model.add(LSTM(512, return_sequences=True, input_shape=(maxlen, num_chars)))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(num_chars))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

The results

It didn’t end up learning the structure of bars and 16-notes after the first iteration.

0b000000000 0b010100000 0b100000000 0b100010011 0b000000000 0b000000000 0b100001010 0b100000000 0b000000000 0b010000000 0b000000000 0b101000000 0b000011000 0b000000000 0b001001101 BAR
0b000000000 0b000000000 0b001000000 0b100000000 0b100000001 0b010000000 0b000100000 0b000000000 0b000001100 0b000000000 0b000000000 0b101000000 0b000000000 0b011000000 0b001000000 0b000000000 BAR
0b100000000 0b000000000 0b000001000 BAR
0b000000000 0b100100000 0b010100000 0b100100000 0b010100000 0b101000000 BAR
0b000000000 0b000000000 0b000000000 0b100000010 0b000000000 0b010100000 0b000000000 0b101000000 0b000000000 0b101000000 0b000100000 0b011000000 0b100000001 0b000000000 0b000000000 0b100100000 0b011000000 0b000000000 0b001000000 0b101000000 0b000000000 0b010000001 0b010000000 0b101000000 0b100100000 0b100000000 0b100000000 0b000000000 0b000000000 0b101000000 0b010010000 0b000000000 0b101000000 0b101000000 0b000000000 0b000000000 0b000000000 0b101000000 0b000000000 0b000011000 0b000000000 0b000000000 0b100100000 0b000010000 0b100100000 0b000000000 0b100000000 0b001000000 0b000000000 0b101000000 0b000000000 0b000000000 0b000000000 0b000000000 BAR
0b000000000 0b101000000 0b100000001 0b000000000 0b000000000 0b000000000 0b000100000 0b000000000 0b000000000 0b000000000 0b000000000 BAR 0b000000000 0b000000000 0b000000000 0b100000000 BAR

After 45 iterations it looks more structured. (I added \n for better understanding)

After 60 iterations it became bit more structured.

Okay, it’s boring, let’s listen to the rhythm.

First, I add a score for this track:

It looks like a proper drum score. You can listen to it below:

I ran 60 iterations with diversity parameters of [0.5, 0.8, 1.0, 1.25, 1.5]. I’ll present 10 tracks – with all the diversity parameters x [30th, 60th] iteration result.

With the diversity value of 1.50 and 1.25 the track sounds bit too virtuosic. Track 3, 8, 9, 10 are boring but probably more makes sense.

Track 8 and 9 are interesting, to some extent is has regular patterns with kick, snare, hi-hats + crash cymbals.

Do they sounds like Lars Ulrich?… Perhaps not yet.

Discussion

Looks like I have to fine tune the diversity parameter to get more reasonable drum tracks.
The proposed encoding scheme – based on nine binary digits – seems making sense.
Probably learning with different segment would lead a whole, complete track with a proper beginning and ending.
I don’t think it is deeply understanding the structure – the regular patterns of kick, snare, hi-hats, and the meaning (except track 8 and 9). At least it didn’t come to me that easily.
It would be also cool if I had more data with other bands to have some pun fun. E.g. Rage Against the long short Term Memory.
What I wanted to do is to do it with jazz drum tracks so that I could combine them with LSTM Realbook. Is there any good (and hopefully free) resources for it?

Code & dataset

Dataset is now shared in my repo. Will share the code soon.

22 Comments

WendsdayDaintyFood says:

February 23, 2016 at 2:40 pm

I stumbled upon to visit your post.
Well I don’t describe myself as an machine learning professionalist, but I think no one can deny that you’ve made such a great thing, which shows what machine learning can do in our real life with a very fun way.

LikeLike

1. keunwoochoi says:
  
  February 23, 2016 at 2:55 pm
  
  I feel flattered, thanks for your kind comment!
  
  LikeLike
  
G says:

March 23, 2016 at 12:40 pm

Very cool, I’ve learned a lot! Keep rocking!

LikeLiked by 1 person

1. keunwoochoi says:
  
  March 23, 2016 at 4:41 pm
  
  Thanks and happy to hear that it helped. And yeah, keep rocking!!
  
  LikeLike
  
Anonymous says:

March 31, 2016 at 3:43 pm

The encoding scheme you choose is a cool idea! And the LSTM certainly has learned the right patterns. Nice work!

LikeLiked by 1 person

Pingback: LSTMetallica:… « Another Word For It
paulo olveira says:

April 16, 2016 at 3:29 pm

Hey this is very cool! It would be awesome to create a tool using this method for generating drum tracks based on the bands someone likes.

LikeLiked by 1 person

1. keunwoochoi says:
  
  April 16, 2016 at 11:06 pm
  
  Thanks! The Drummer track in Logic X is already very cool, but more works to appear soon 🙂
  
  LikeLike
  
Pingback: Paper is out: Text-based LSTM networks for Automatic Music Composition | Keunwoo Choi
Pingback: vonnik comments on "Heavy Metal and Natural Language Processing"
Pingback: Generando pistas de batería con redes LSTM aprendiendo de Metallica
Thamer says:

January 24, 2017 at 4:45 pm

Can you provide the midi->text conversion code?

LikeLike

1. keunwoochoi says:
  
  January 24, 2017 at 4:48 pm
  
  Sorry, I can’t because I don’t have the right.
  
  LikeLike
  
  1. Thamer says:
    
    January 24, 2017 at 5:14 pm
    
    Thank you.
    I really enjoyed your work and would like to contribute with it and use it as a basis for new research (citation is your job).
    Could you just tell if you created the algorithm or if you used a ready tool?
    
    LikeLike
keunwoochoi says:

January 24, 2017 at 5:42 pm

Oh, excuse me, I got confused with chord progression work. I coded it by myself using https://github.com/keunwoochoi/LSTMetallica/blob/master/drum_note_processor.py . Sadly I can’t find the script I used. I’m afraid I removed it while cleaning up the folder. (I don’t wanna believe it.. I never delete codes, just found out this.)

In the file above, there are two classes – note and note list. The methods of Note_List is ordered as it’s supposed to be used. add_note, quantise them, simplify drum events, and return as text. I hope it helps.

LikeLike

jonh129847 says:

August 2, 2017 at 10:26 am

Nice work. I am little confused on how to generated text from midi files. Can you help?

LikeLike

lkasjdflaskjd says:

May 28, 2018 at 2:10 am

I would not listen to ROCKMUSIC or any music of the devil!! I would pray about all video games, movies, and tv shows and not watch anything of the world.

LikeLike

Pingback: Drum Patterns from Latent Space – Data Science Austria
Pingback: Neural Networks Generated Lamb of God Drum Tracks – Data Science Austria
Pingback: RNN – EnGear
Pingback: 인공지능 – EnGear
Pingback: 인공지능 관련 자료 – EnGear

LSTMetallica: Generation drum tracks by learning the drum tracks of 60 Metallica songs

Summary

A Quick look on things (copied-and-pasted from my previous post)

LSTM