UK Music Informatics and Cognition (MIC) 2016 Workshop
Speakers and Abstracts

Emmanouil Benetos - Queen Mary University

Automatic music transcription using matrix decomposition methods

Abstract: Automatic music transcription (AMT) is defined as the process of converting an acoustic music signal into some form of human- or machine-readable musical notation. It is considered a key enabling technology in the field of music information retrieval and can be divided into several subtasks, which include multi-pitch detection, note onset/offset detection, instrument recognition, pitch/timing quantisation, extraction of rhythmic information, and extraction of dynamics and expressive information. However, despite recent advances AMT still remains an open problem, especially when considering multiple-instrument music with high polyphony.

A large part of current AMT research focuses on matrix decomposition methods, which decompose a time-frequency representation of a music signal into a series of note templates and note activations. This has led to music transcription systems that are computationally efficient, robust, and interpretable. In this talk, I will present recent advances in AMT focusing on proposed systems that are able to detect multiple pitches and instruments, are able to support tuning changes and frequency modulations, and take into account the temporal evolution of notes. Recent work on integrating music language models with acoustic models for improving AMT performance will also be discussed, along with applications of AMT to fields beyond music informatics, including musicology, music acoustics, music education, and environmental sound analysis.


Andrew McLeod - University of Edinburgh

Symbolic Music Analysis for Music Transcription

Abstract: Advances in automatic music transcription performance have been disappointingly slow, with accuracy still falling well below that of human experts at the task. In this presentation, I argue that this problem cannot be sufficiently solved without the use of some sort of music language model. I will show that the use of Natural Language Processing (NLP) techniques can aid in the creation of this music language model when applied to a symbolic music representation such as MIDI performance data.

I first present a working Voice Separation model which works on symbolic music data, both quantized and performed. I also discuss ongoing work on time signature identification based on rhythmic analysis using a lexicalized probabilistic context-free grammar, and argue that such a grammar can capture the long-range dependencies of musical rhythms.


Tillman Weyde - City University London

Big Data in Musicology

Digital music libraries and collections are growing quickly and are increasingly made available for research. I argue that the use of large data collections will enable a better understanding of music performance and music in general, which will benefit areas such as music search and recommendation, music archiving and indexing, music production and education. However, to achieve these goals it is necessary to develop the necessary tehchnology, new musicological research methods, to create and adapt the necessary technological infrastructure, and to find ways of working with legal limitations. Most of the necessary basic technologies exist, but they need to be brought together and applied to musicology. I will discuss these issues in the context of the Digital Music Lab project where we developed a infrastructure prototype for Analysing Big Music Data.


Stephanie Shih - University of California, Merced

The cognitive reality of prosodic units in Japanese: evidence from music and language pairing

Abstract: Text-setting is the pairing of language to music, and, like other linguistic art forms such as metrical verse, text-setting has been shown to be systematically patterned. As such, text-setting data provides a rich arena for investigating cognitive structures in both language and music.

This talk presents the case of Japanese prosodic units, which have been a source of heretofore unsettled debate in the study of prosody in language: given that Japanese exhibits dominantly moraic prosodic structure, what is the cognitive status of the syllable in the language? Previous arguments surrounding this debate have drawn upon evidence from linguistic art forms, noting that mora-based segmentation plays a prominent role in Japanese songs: i.e., one mora corresponds to one musical note (Poser 1990; Inaba 1998; Labrune 2012). However, little notice has been taken of the fact that syllable-based segmentation, in which polymoraic syllables are undivided and correspond to single moraic notes, is also prevalent (cf. Tanaka 2000). Here, I present two studies–corpus and experiment-based–finding that, while moraic text-setting is canonical, syllabic text-setting is also common and accepted in Japanese songs. The results indicate that this variation is conditioned by both linguistic factors (e.g., structure of the lexicon) and external pressures (e.g., translation contexts, information density mismatch, and knowledge of correspondence to foreign loans. The studies provide positive evidence for the cognitive reality of the syllable in Japanese, and demonstrate the utility of text-setting data for probing linguistic (and musical) structure.

(based on joint work with Rebecca L. Starr, National University of Singapore)

Sponsored by ILCC