Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus

Forfattere

  • Ole Norling-Christensen

Resumé

The paper gives an introduction to the design and composition of general language corpora, and the problem of statistical representativeness is considered. In order to determine, check and document their composition, a classificatory scheme for text types is needed. The building of a corpus is seen as an iterative process: The original design may set up overall quantitative measures for a few easily distinguishable and rather general text types; after the collection of some texts or text samples, which should be annotated according to a more fine-grained classification scheme, the distribution by more specific classes (e.g. topic under a general class of non-fiction) is measured, and the design criteria are adjusted accordingly. The design of the corpus of The Danish Dictionary and of the corpus of Danish which constitutes part of the European PAROLE project, are used as examples.

Downloads

Publiceret

1996-01-01

Citation/Eksport

Norling-Christensen, O. (1996). Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus. LexicoNordica, (3). Hentet fra https://tidsskrift.dk/lexn/article/view/18916