Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus

Ole Norling-Christensen

doi:10.7146/ln.v0i3.18916

Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus

Authors

Ole Norling-Christensen

DOI:

https://doi.org/10.7146/ln.v0i3.18916

Abstract

The paper gives an introduction to the design and composition of general language corpora, and the problem of statistical representativeness is considered. In order to determine, check and document their composition, a classificatory scheme for text types is needed. The building of a corpus is seen as an iterative process: The original design may set up overall quantitative measures for a few easily distinguishable and rather general text types; after the collection of some texts or text samples, which should be annotated according to a more fine-grained classification scheme, the distribution by more specific classes (e.g. topic under a general class of non-fiction) is measured, and the design criteria are adjusted accordingly. The design of the corpus of The Danish Dictionary and of the corpus of Danish which constitutes part of the European PAROLE project, are used as examples.

Downloads

PDF (Dansk)

Published

1996-01-01

How to Cite

Norling-Christensen, O. (1996). Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus. LexicoNordica, (3). https://doi.org/10.7146/ln.v0i3.18916

Download Citation

Issue

No. 3 (1996): Korpusbasert leksikografi i Norden

Section

Tematiske bidrag

License

LexicoNordica og forfatterne

Klassifikation af korpustekster, og kvantitative mål for sammensætningen af et almensprogligt korpus

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Language

Current Issue