Apr 12, 2007:

Context and Learning in Multilingual Tone and Pitch Accent Recognition

Prof. Gina-Anne Levow, University of Chicago

Tone and intonation play a crucial role across many languages. However, the use and structure of tone varies widely, ranging from lexical tone which determines word identity to pitch accent signalling information status. In this work, we employ a uniform representation of acoustic features for recognition of Mandarin tone, isiZulu tone, and English pitch accent. The representation captures both local tone height and shape as well as contextual coarticulatory and phrasal influences. By exploiting multiclass Support Vector Machines as a discriminative classifier, we achieve competitive rates of tone and pitch accent recognition. We further demonstrate the greater importance of modeling preceding local context, which yields up to 24% reduction in error over modeling the following context.

While these approaches to this recognition task have relied upon fully supervised learning methods employing extensive collections of manually tagged data obtained at substantial time and financial cost, we next explore two approaches to tone learning with substantial reductions in training data. We employ both unsupervised clustering and semi-supervised learning to recognize pitch accent and tone, based on the intrinsic structure of the tones in acoustic space. In unsupervised tone and pitch accent clustering experiments, we achieve 75% to 96% of accuracy rates achieved with large training data sets. For semi-supervised training with only small numbers of labeled examples, accuracies reach 90-98% of levels obtained with hundreds or thousands of labeled examples. These results indicate that the intrinsic structure of tone and pitch accent acoustics can be exploited to reduce the need for costly labeled training data for tone learning and recognition.