Annotating large speech corpora: building on the experience of Marsec

Gerry Knowles

doi:10.7146/hjlcb.v7i13.25076

Annotating large speech corpora: building on the experience of Marsec

Authors

Gerry Knowles

DOI:

https://doi.org/10.7146/hjlcb.v7i13.25076

Abstract

This paper discusses a methodology for the processing of large amounts of speech data using database techniques and applying the lessons learned in the compilation of the Marsec database. The methodology is offered as an alternative to the conventional method of processing the orthographic transcription using only techniques designed for written texts. It is argued that while according to past practice it might appear that the first step in processing spoken texts is to make phonemic and prosodic transcriptions, these are not in reality necessary. Given the appropriate organisation of the data, much of the information in conventional transcriptions is predictable, and human expertise is required only to add unpredictable supplementary annotations.

Published

2017-01-04

How to Cite

Knowles, G. (2017). Annotating large speech corpora: building on the experience of Marsec. HERMES - Journal of Language and Communication in Business, 7(13), 87–98. https://doi.org/10.7146/hjlcb.v7i13.25076

Download Citation

Issue

No. 13 (1994)

Section

Thematic Articles

License

Authors who publish with this journal agree to the following terms:

a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Annotating large speech corpora: building on the experience of Marsec

Authors

DOI:

Abstract

Published

How to Cite

Issue

Section

License

Information

Current Issue