A Musically Motivated Mid-Level Representation For Pitch Estimation And Musical Audio Source Separation

Introduction

This page presents some results and media related to the submitted article "A musically motivated mid-level representation for pitch estimation and musical audio source separation", J.-L. Durrieu, B. David and G. Richard, IEEE Journal on Selected Topics on Signal Processing, Music Signal Processing, October 2011 (first submission 29th Sept. 2010, revised 2nd Feb. 2011), Vol. 5 (6), pp. 1180 - 1191.

Annotation files

We have annotated the 5 songs from the development database of the SiSEC 2010 "Professionally Produced Music Recordings" evaluation campaign. The annotation for each file is the melody, evaluated on frames of size 46.44ms (2048 samples@44100 Hz), every 5.8ms (256 samples). We gathered them in this archive. In each file, each row, the first value is the time-stamp (s) and the the second one is the fundamental frequency (Hz) of the corresponding frame.

Source code

BSSEval.zip: an archive containing a Python/NumPy/Cython implementation of BSSEval. These scripts were used to evaluate our algorithms. We have also tested them on some examples, and comparison with the original Matlab implementation seems correct for a delay parameter equal to 0 (no delay allowed). For higher delays, our implementation seems to be rather unstable.
pitchEval.py: the scripts we used to evaluate the melody estimation.
separateLeadStereo.zip: the programs and scripts implementing the proposed systems: melody estimation, VIMM and VUIMM to separate the lead instrument from the accompaniment.
[Experimental] f0salience: a Vamp Plug-In which implements the salience function proposed in our article. Mainly thought as a plug-in for Sonic Visualiser. In the Git repository, you will find compiled version for Windows 32 bits (.dll), Linux 32 and 64 bits (.so) and MacOsX 10.6 32 bits (.dylib). Note that for Windows 64 bits, you can also use the 32 bit version, see the README file for details. Also important is to note that the linux compiled libraries seem not to work, but it should be fairly easy to manipulate the makefiles to fit your environment. The plug-in might still be prone to some errors. It does not exactly implement the algorithms explained in this article, although the representation obtained under Sonic Visualiser may roughly show what can be expected. When using it, Sonic Visualiser may seem to "freeze" for a rather long time - depending on the required parameters. The initialisation may indeed take some time to generate the basis - dictionary - matrices.

Sound examples

The five songs used for the experiments are given in the following table, with typical separation results. The systems are the V(U)IMM systems, with 50 iterations. You can also access directly to the files here.

	Original	VIMM	VUIMM
Bearlin
Bearlin Vocals
Bearlin Music
Tamy
Tamy Vocals
Tamy Music
Another
Another Vocals
Another Music
Fort
Fort Vocals
Fort Music
Ultimate
Ultimate Vocals
Ultimate Music

License

All the original WAV files can be found on the website for the evaluation campaign SiSEC2010. Here is an excerpt of the license section you can read on that page:
"All audio files are distributed under the terms different licenses, as listed below for each recording:

Tamy - Que Pena Tanto Faz: Creative Commons Attribution Noncommercial (3.0)
Bearlin - Roads: Read License
Glen Philips - The Spirit of Shackleton Creative Commons Attribution 3.0
Another Dreamer - The Ones We Love Creative Commons Attribution-NonCommercial 1.0
Fort Minor - Remember the Name Creative Commons Attribution-NonCommercial 2.5
Ultimate NZ Tour Creative Commons Attribution-Noncommercial-ShareAlike 3.0

All the former test and development data (test1 and dev1) are from MTG MASS database by M. Vinyes."

Evolution of estimated $H$ ^F₀

The following video (older color version here) shows the evolution of the matrix

H

^F₀ during the first parameter estimation, prior to the estimation of the melody, for the excerpt by J. Pastorius, "Three views of a secret". Time is in abscisse (in samples) while the ordinate scale corresponds to a logarithmic fundamental frequency scale.
It is interesting to see how the different contributions of the instruments, the trumpets, separated by one octave, and the harmonica, become clearer and clearer over the iterations. Indeed, it becomes visible that the upper octave trumpet has a pitch which is more varying, especially with the effect at 0.09s, while the attacks of the harmonica are clearly behind those of the trumpet.

Jean-Louis Durrieu

Last modified: Fri Sep 13 11:06:28 CEST 2013