A corpus based method for a diachronic study of the central vocabulary of New Norwegian
Nøgleord:monitor corpus, relative frequencies, central vocabulary, New Norwegian, corpus linguistics
ResuméThis article describes how a monitor corpus can be created from existing corpora where the texts are annotated with basic bibliographical information. The focus is on the core vocabulary and how to document diachronic change. The implementation uses simple, readily available techniques from corpus linguistics. A norm is defined by isolating a small vocabulary unlikely to change from text to text, consisting of the most frequent words. These words are the building blocks of language – function words and some highly frequent verbs and nouns, words essential to producing grammatical sentences. The relative frequencies of these words from one text collection to another will show minor deviations. This slight deviation is used to specify the norm. It is the ‘wiggle room’. Sub-corpora are then created of the corpus sections to be compared. Relative frequencies are produced for every word in both subsets. When the relative frequency of a word from one subset deviates more than the above-mentioned wiggle room compared to the same word in the other chronological subset, something has happened. A word may be moving in or out of the language or texts in one collection may deviate drastically from texts in the other, or an event may have raised an otherwise obscure technical term to the front page.
Ridings, D., & Grønvik, O. (2012). A corpus based method for a diachronic study of the central vocabulary of New Norwegian. Nordiske Studier I Leksikografi, (11). Hentet fra https://tidsskrift.dk/nsil/article/view/19366
Nordisk Forening for Leksikografi/NSL og forfatterne.