Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus
Abstract
The paper gives an introduction to the design and composition of general language corpora, and the problem of statistical representativeness is considered. In order to determine, check and document their composition, a classificatory scheme for text types is needed. The building of a corpus is seen as an iterative process: The original design may set up overall quantitative measures for a few easily distinguishable and rather general text types; after the collection of some texts or text samples, which should be annotated according to a more fine-grained classification scheme, the distribution by more specific classes (e.g. topic under a general class of non-fiction) is measured, and the design criteria are adjusted accordingly. The design of the corpus of The Danish Dictionary and of the corpus of Danish which constitutes part of the European PAROLE project, are used as examples.Downloads
Published
1996-01-01
How to Cite
Norling-Christensen, O. (1996). Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus. LexicoNordica, (3). Retrieved from https://tidsskrift.dk/lexn/article/view/18916
Issue
Section
Tematiske bidrag
License
LexicoNordica og forfatterne