Annotating large speech corpora: building on the experience of Marsec

Authors

  • Gerry Knowles

DOI:

https://doi.org/10.7146/hjlcb.v7i13.25076

Abstract

This paper discusses a methodology for the processing of large amounts of speech data using database techniques and applying the lessons learned in the compilation of the Marsec database. The methodology is offered as an alternative to the conventional method of processing the orthographic transcription using only techniques designed for written texts. It is argued that while according to past practice it might appear that the first step in processing spoken texts is to make phonemic and prosodic transcriptions, these are not in reality necessary. Given the appropriate organisation of the data, much of the information in conventional transcriptions is predictable, and human expertise is required only to add unpredictable supplementary annotations.

Published

2017-01-04

How to Cite

Knowles, G. (2017). Annotating large speech corpora: building on the experience of Marsec. HERMES - Journal of Language and Communication in Business, 7(13), 87–98. https://doi.org/10.7146/hjlcb.v7i13.25076

Issue

Section

Thematic Articles