Resources on Music and (Science or Mathematics or Computing)

General Interest Books (mostly music and psychology or cognitive science)


Web Sites on Digital Signal Processing (DSP) and Music Information Retrieval (MIR)

Web Sites on Musical Acoustics



News about Audio Technology:

Audio Programming and General Studies


Sonic Visualizer: Free high-end tool for various audio-processing tasks: Plugins:

Pure Data Tutorial:

Surveys and Review Articles and Books on MIR

Python Programming Resources on Audio


Music and Mathematics/Physics

Specialist Books


Related BU Courses

Courses, Centers, and Web Pages at Other Universities

Learned Societies





Albeton [Link]

SpectraSonics [Link]

Celemony [Link]


Web Pages and Resources on Specific Topics

Sound Samples:

FreeSound: HTML

Julius O. Smith's Online Books: (HTML)

Mathematics of the DFT: HTML

Physical Audio Signal Processing for Vistual Musical Instruments and Audio Effects: HTML

Spectral Audio Signal Processing: HTML

Source Separation:

Kernel Additive Modeling:


Keynote talks from workshop on Timbre: HTML

Time and Pitch Scaling:

PSOLA and Vocoders:

PSOLA Technique for voice synthesis: PDF

Voice Conversation using PSOLA: PDF

Pitch Determination:

SIFT algorithm for F0 determination; PDF

Nigel Redmon's Blogs on Audio:

Audio Dither Explained: HTML


Interesting lab on using convolution on audio files: HTML

Master's Thesis on Speech Synthesis: PDF

Article on using MIR to measure cultural change in pop music: HTML



Miscellaneous Posts from the MUSIC-IR mailing list with useful information and links:




Hi all,

I am pleased to announce that a new version of aubio, 0.4.0, has been released.

aubio is a library of functions to perform audio feature extractions such as:

 - note onset detection
 - pitch detection
 - beat tracking
 - MFCC computation
 - spectral descriptors

The core library, written in C, focuses on speed and portability. It is known
to run on most modern operating systems, including Linux, Mac OS X, Windows,
Android, and iOS.

A new python module, rewritten from scratch, gives access to the features of
aubio's core library from within Python. Tightly integrated with Python NumPy,
the aubio python module provides an efficient way to extract features from
audio streams as well as to design new algorithms.

To find out more about this release:

Post announcing aubio 0.4.0:

ChangeLog for aubio 0.4.0:

Source tarball, signature and digests:

API Documentation:

Binary builds for MacOSX, Windows, and iOS:

Merry hacking and best wishes to all,



Hi all,

After several requests over the last month or so, I am pleased to (finally) annouce, on behalf of Erik Schmidt, Philippe Hamel, and myself, that the slides of our ISMIR tutorial on deep learning are now being hosted at MARL@NYU.

At this site, you will also find a programming walkthrough in Python I've recently put together, demonstrating how to render some of these ideas to practice. There are currently two examples with full source code to get those interested up and running as soon as possible:
While not as thorough as the Theano tutorials, the goal of this resource is to provide a different perspective on the same concepts and tools, in the context of music-minded tasks. If you simply want to see the code looks like and skip my commentary, feel free to clone the code repository and get coding.

Additionally, to help make this whole process a little easier, we're providing two new development datasets:
While these were designed to be used with the programming tutorial, you may find them useful for sanity checking algorithms or other machine learning problems. For more information, consult the documentation in the respective archive.

Lastly, if anyone runs into any issues, please do not hesitate to contact me (

I was asked sometime ago by the IEEE Signal Processing Society to record a 30 minute presentation
with an overview of Music Information Retrieval with emphasis on digital signal processing aspects
and it is finally out. The resulting presentation can be found at:

There is also an excellent more focused tutorial by Meinard Mueller on Chroma Features at the same

Apologies for the self-promotion but I thought you might find it a useful resource
when needing a quick overview introduction to MIR.

best regards,
George Tzanetakis



Hello all, and Happy New Year.

I’d like to announce the revised and expanded edition of the “Big MAT Book,” a 12-part course sequence in multimedia engineering and audio software and hardware.

The web site for the free downloads (course slides, readers and example code) is,

Here’s the updated description,

and here’s the introduction,

Introduction to the Series “Courseware for Audio & Multimedia Engineering”

Multimedia engineering is a broad and complex topic. It is also one of the fastest-growing and most valuable fields of research and development within electronic technology. The book before you is an anthology of curriculum materials developed over the space of 12 years at the University of California, Santa Barbara for students in UCSB’s Graduate Program in Media Arts and Technology.

TheBigMATBook consists of the presentation slides for twelve ten-week courses, amounting to over 600 hours of presentation time. For each of the twelve courses, the presentation slides are accompanied by the tables of contents of the course readers, and an overview of the example code archives. These resources are available for down-load from the HeavenEverywhere web site (see

The multimedia engineering courses included here cover theory and practice, hardware and software, visual and audio media, and arts as well as entertainment applications. Some of the courses (the first two chapters) are required of all MAT graduate students, and thus must target less-technical and also non-audio-centric students. The bulk of this material, though, consists of elective courses that have somewhat higher-level prerequisites and assume basic knowledge of acoustics and some (minimal) programming experience in mainstream programming languages.

TheBigMATBook courses borrow liberally from R&D publications by my friends and colleagues, especially Roger Dannenberg, Julius O. Smith, D. Gareth Loy, F. R. Moore, Perry Cook, Adrian Freed, George Tzanetakis, Ross Bencina and Dan Overholt. I want also to express my deepest thanks to my MAT and Music Dept. colleagues JoAnn Kuchera-Morin, Curtis Roads, Clarence Barlow, Matthew Wright and Matthew Turk, and to the many students who helped these courses evolve, either as course participants or teaching assistants.



On Wed, Jun 18, 2014 at 11:53 AM, Jamie Bullock <> wrote:


Does anyone know of a free or commercial dataset of audio files with corresponding labelled onset locations for each file?

Ideally, I am looking for monophonic audio (one instrument or sound-source per file).




CPJKU has posted a large collection of annotations here:
The audio for some of the annotations is freely available (and linked in the repo's README); for others, you'll need to contact the authors to try to get the audio.

Hi Jamie,

you can find a few onset datasets listed here:



Hi everyone!

After Antoine and Romain answered, I figured I could complete the picture with two other works, done or closely related to works done at our former lab (, based on NMF, mainly focussing (and successful) on vocals removal:

* Benoit Fuentes, Roland Badeau, and Gaël Richard, "Blind Harmonic Adaptive Decomposition Applied to Supervised Source Separation", 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, August 27-31, 2012, pp. 2654-2658.

* J.-L. Durrieu and J.-Ph. Thiran, "Musical Audio Source Separation Based on User-Selected F0 Track", the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), March 12-15, 2012, Tel-Aviv, Israel.

(Trained eyes will notice the subtle anonymous reviewer's trick to place some of his less recommendable publications in the discussion. A less subtle reviewer would even have added that the first reference is not to be considered too seriously because the acronym is BHADASSS, but that would just be pure jealousy, really!)

In practice, if you are rich and have a legal copy of Matlab, I guess Benoit's work will help you a lot. My own work requires a bit more effort on the installation side (although it should be rather ok under windows and linux, less so under macosx - even if that's where I programmed it!), but it's Python and Qt, so it's free. Processing huge files will be a problem, but if you cut your audio file into 20/30s excerpts and process them one after the other, you should get the task done pretty easily.

My webpage also features some more links for command-line python programs, so you can batch process more files, if needed: or

Hope this helps!

Best regards,





In the course of my work on pitch tracking / melody extraction I've always found it useful to synthesize the extracted pitch sequence for (e.g.) qualitative evaluation.

To this end, I've written a little python script called MeloSynth that reads in a pitch sequence (represented as two columns: timestamps and f0 values) and saves a synthesized version to disk.

The script supports batch processing of folders and provides some basic options such as setting the number of (harmonic) sinusoids to use, the sampling frequency, and choosing whether or not to synthesize negative frequencies (a convention in melody extraction for indicating potentially unvoiced frames).

In case anyone finds it useful, it's up on GitHub now (with some instructions):

You need to have python installed to run the script, but you don't need to know python to use it.

I should also mention that the latest version of Sonic Visualiser can now sonify f0 curves (hurrah!), so if you just want to listen to the output of a vamp plugin (such as melodia, wink) you can now do so directly in SV. For non-vamp algorithms (or if you need to save the synthesis to disk / batch process a folder) the script should do the trick.




Lots of good stuff here:

On Wed, Jan 7, 2015 at 3:28 PM, li luo <> wrote:
> Dear community list,
> we are trying to evaluate our program on pitch estimation, and want to use
> some database for this work. However, for the musicl instrument signal, we
> only found the database provided by the University of Iowa Electronic Music
> Studios So
> does anyone know some other available databases for musicl instrument signal
> with labeled reference pitches for the purpose of pitch tracking estimation?
> Thanks for your help!
> Best,
> Li Luo
> University of Duisburg-Essen, Campus Duisburg
> Department of Communication Technologies
> --


Dear Li,


You can also find a task focused summary with additional datasets here:


There are some datasets in Alexander Lerch’s list that are not explicitly tagged as pitch estimation datasets. The MAPS dataset for example is also intended for multipitch estimation and contains a lot of annotated single pitch tracks.





  Alexander Schindler

  Music Information Retrieval
  Institute of Software Technology and Interactive Systems
  Vienna University of Technology




Dear melody lovers,

I’m pleased to announce the release of pYIN version 1.1 with
known pitch tracking and improved note tracking.

pYIN (Probabilistic YIN) is a modification of the well-loved YIN
algorithm for fundamental frequency (F0) estimation in monophonic
audio. This Vamp plugin's main features are

* an implementation of pYIN for pitch tracking and
* a note-tracking algorithm based on the pYIN pitch track

The plugin is written in C++ and can be used by any Vamp host such
as Sonic Visualiser and Sonic Annotator. The melody transcription
program Tony ( uses it by default.

Associated publications:

1) for pYIN pitch tracking
M. Mauch and S. Dixon, “pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions”, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014), 2014.

2) for pYIN note tracking
M. Mauch, C. Cannam, R. Bittner, G. Fazekas, J. Salamon, J. Dai, J. Bello and S. Dixon, “Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency”, in Proceedings of the First International Conference on Technologies for Music Notation and Representation, 2015.

Thanks to Chris Cannam for help with multi-platform builds and release.



I just found this new video by Minute Physics, about why it’s impossible to tune a piano.
You already knew this, but I think the presentation is brilliant and that you would appreciate it.



Dear Colleagues:

It is with great pleasure and excitement that I announce a new, and slightly different, Grand Challenge 2015 User eXperience (GC15UX). Based upon a intriguing collection of jazz recording session metadata donated by the J-DISC (Jazz Discography) team at Columbia, GC15UX:J-DISC provides unique opportunity for MIR system developers to explore and then utilize networked relationships among performers, composers, songs and recordings. Please do visit the J-DISC source site at: We hope that your brilliant uses of this very cool data can help inspire improvements and new functionalities that could then be incorporated into the important J-DISC site.

J-DISC is a resource for searching and exploring jazz recordings created by the Center for Jazz Studies at Columbia University. It is organized to present complete information on jazz recording sessions, and merge a large corpus of session data into a single easily accessible repository, in a manner that can be easily searched, cross-searched, navigated and cited. In addition to the focus on recording artist/leaders of traditional discography, J-DISC incorporates extensive cultural, geographic, biographical, composer and studio information that can also be easily searched and accessed.

Information about the GC15UX:J-DISC task can be found at:

J-DISC task dataset contains fully structured and searchable metadata. Key entities in the dataset include person, skill, session, track, composition, and issue. There are 19 tables in the dataset representing various relationships between those entities. The data schema is available below:

The final motivating task that will motivate evaluators will be defined after (during) ISMIR 2015. Because this dataset is has strong and interesting networking information, and no underlying audio, we should define a task definition that best fits this state of affairs.

The tentative deadline for submission is currently set at 15 January 2016, so you have lots of time to get those creative juices flowing and made manifest into a spiffy new interactive system.

If you have any questions about this truly groovy Grand Challenge, do not hesitate to contact me, J. Stephen Downie at <>


The MIREX team will briefly discuss this new Grand Challenge during the MIREX Plenary meeting on Friday, 30 October. To devote more time to our discussions, we are then proposing an Unconference session on things Grand Challenge, also on Friday, 30 October. Tad Shull, a J-DISC leader and Jazz expert, will be coming to ISMIR to take part in our discussions, both formal and informal.

Cheers and see you soon in Malaga (in reality or in spirit).

Onsets, offsets, inter-onset intervals, etc
Hierarchical beat structure (GTTM dot-notation-style beat level annotations)
Rhythmic patterns of various window sizes (histogram, transition probabilities)
Accent strength for each time point


Key, mode (time-varying)
Raw Chords
Chord labels relative to local key
Pitch class sets
Chord distribution
Chord transition distribution
Modulatory structure


Pitches relative to tonic
Pitch class distribution over time
Pitch transition distribution
Contour (at different levels of granularity)
Non-harmonic tone annotations
Polyphony / counterpoint (i.e., explicit representation of separate melodic lines and pairwise interval relationships between lines)

Melody/Harmony x Rhythm

Harmonic rhythm
Melodic speed

Temporal Structure

Loudness vs time
Density vs time
Phrase structure (grouping boundaries, slurs)
Large scale form (e.g., A A’ B A’’)
Repetition/Analogy (at all levels of granularity: notes, motifs, sections, etc.)
Development (variations on themes, and a description of operations leading to each variation)
Tension/Release vs time (e.g., see Elizabeth Margulis 2003)


Syllables aligned to notes
Word distribution
Word complexity
Word and phrase IDF

Timbre and Mood

Instrument names (and styles per note, such as string “pizzicato”
Human annotations (descriptive)