posted on 1985-01-01, 00:00authored byRoger B Dannenberg, Christopher Raphael
As with many other kinds of data, the past several decades have witnessed an explosion in the
quantity and variety of music in computer-accessible form. There are primarily two kinds of
“music data” one encounters today: sampled audio files, such as those found on compact discs or
scattered over the web in various formats, and symbolic music representations, which essentially
list notes with pitch, onset time, and duration for each note. To draw an analogy, music audio is
to symbolic music as speech audio is to text. In both cases the audio representations capture the
colorful expressive nuances of the performances, but are difficult to “understand” by anything
other than a human listener. On the other hand, in both text and symbolic music the high level
“words” are parsimoniously stored and easily recognized.
We focus here on a form of machine listening known as music score matching, score following,
or score alignment. Here we seek a correspondence between a symbolic music representation and
an audio performance of the same music, identifying the onset times of all relevant musical
“events” in the audio—usually notes. There are two different versions of the problem, usually
called “off-line” and “on-line.”