Large-Scale Pattern Discovery in Music

This is a quick overview of my Ph.D. thesis. It tries to answer: what is about? and, is it worth it for you to read it? You can get the PDF here.

Quick background: I defended my thesis in January 2013. The work was done in collaboration mostly with my adviser, Dan Ellis, at LabROSA, Columbia University. A list of my publications can be found here.

My thesis is split in two parts: 1) the Million Song Dataset (MSD), and 2) large-scale cover song recognition using the dataset. The MSD is a very useful resource, and the thesis gives a good and complete overview. That being said, you might be better off starting with my post: the MSD in 250 words, the MSD website, and the original paper.

The 2nd part can be summarized by: we have this awesome resource(the MSD), what cool stuff can we do with it. Tons of tasks can be performed on the dataset (tagging, metadata analysis, year prediction,  recommendation, etc), but we decided to focus on cover song recognition. Why?

  • This task has never been studied on a large scale (more than a few thousand songs).
  • It needs a second wind: MIREX 2011 results did not present any improvement, and the task wasn’t run in 2012.
  • It is a difficult problem, a lot can change between covers! Thus, a good cover song recognition solution should be helpful for other tasks as well (e.g. segmentation).

Note that if you do any work on cover song recognition, make sure to start with Serrà’s thesis! (PDF). It is the reference work.

We mostly start from scratch on this task. The main reason is that we have a new dataset (18K covers out of 1M songs) with only The Echo Nest chroma features that were not used in previous systems. Therefore, our main goals are: 1) showing that the task can be tackled at that scale, and 2) provide lessons learned and a reference point for other researchers. Our first solution is inspired by the Shazam algorithm for audio fingerprinting. It was presented at WASPAA ’11 (PDF), and nothing really new was added in the thesis.

Our second solution is of more interest. The idea is two take the magnitude of the 2D Fourier transform of a chromagram (2DFTM). This higher-level feature was first introduced by Marolt (PDF). Our experiments show that it works much better than our fingerprinting-like solution, and those first results were presented at ISMIR ’12 (PDF). Two songs that have similar 2DFTM are likely covers, and you can reduce its dimension with PCA without sacrificing much accuracy.

In the thesis, we further analyze this feature. In particular, we show that:

  • As expected, it is more robust to small time offset than regular chroma features.
  • The phase does not seem to add value to the magnitude as a feature, or at least we did not find a proper way to include it.
  • Encoding a set of patches using a simple distribution (mean and variance for each bin) does not seem to work better in practice than taking the median across all patches.
  • Computing a set of 2DFTM per song (instead of 1), with different beat-per-frame for the underlying beat-aligned chromagram, can improve results at the cost of more data to handle.
  • Our original normalization, before PCA, was wrong, z-scoring the bins help.

So, should you read the thesis? If you are working on large-scale cover song recognition, probably, it will give you a nice reference and help you implement our solution  (we can provide some of the code, too). Otherwise, if you are still interested in our work, my “regular publications” (available here) are probably shorter and more to the point.


The Million Song Dataset in 250 words

MSD logo

As I am finishing my PhD developing and working on the Million Song Dataset (MSD), I thought it would be interesting to try to summarize the project. The goal is to give a quick grasp on what the MSD is and what can be done with it. For more information, visit the MSD website or read the original paper.


The MSD is a very large collection of music data aimed at researchers. It was created by LabROSA and The Echo Nest in 2011. The goals of the MSD include: 1) encourage music technologist to work on a commercial-like scale, 2) create a reference dataset for evaluating research, 3) help new researchers get started in MIR.

The core of the MSD is information about one million songs gathered from The Echo Nest API. It includes identifiers (artist name, albums, titles, Musicbrainz IDs, …), audio features (loudness, timbre, pitches, beats, …) and relationship data (similar artists, artist tags).

Other organizations have joined the project: SecondHandSongs for identifying cover songs, musiXmatch to provide lyrics, for song-level tags and similarity, and The Echo Nest again for user data. Other audio features for 30s snippets were computed by Austrian researchers. All these collections are matched to the 1M songs, making research involving connected data easy. For instance, McFee et al. investigated what information can be used to make playlists. Serrà et al. looked at audio features over time. We also organized a very large, open music recommendation contest.

A lot of the MSD information is gathered or computed automatically (e.g., audio features) and not by human expert. It implies a certain level of noise and errors. However, it is unavoidable when working at that scale, and the size makes up a lot for it. Real music data for 1M songs: start exploring!

First post – Welcome to my blog!

Hello world!

As I’m leaving academia for good (here is my old website), it is time to start fresh with a brand new website. It is here to stay (the url and hosting does not depend on anyone but me), it has a great blog (thanks WordPress!), and it can be expended at will!

This blog will mostly let me talk about technologies I’m investigating for my work as a data scientist. That said, I love NYC (especially the restaurants) and Montreal, I have a soft spot for music technology which I studied in grad school, I think the Canadiens is the best NHL team ever, and I might comment on those too!

Finally, if something I post is useful to you, don’t hesitate to send me en email! I’m always interested to hear about people working on similar problems. For instance, I try to share my code (often python) whenever it makes sense.