Too Big or Not Too Big: Establishing the Minimum Size for a Legal Ad Hoc Corpus


  • Miriam Seghiri Facultad de Filosofía y Letras, Universidad de Málaga



A corpus can be described as “[a] collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis” (Francis 1982). However, the concept of representativeness is still surprisingly imprecise considering its acceptance as a central characteristic that distinguishes a corpus from any other kind of collection (Seghiri 2008). In fact, there is no general agreement as to what the size of a corpus should ideally be. In practice, however, “the size of a corpus tends to refl ect the ease or diffi culty of acquiring the material” (Giouli/Piperidis 2002). For this reason, in this paper we will attempt to deal with this key question: we will focus on the complex notion of representativeness and ideal size for ad hoc corpora, from both a theoretical and an applied perspective and we will describe a computer application named ReCor that will be used to verify whether a sample of legal contracts compiled might be considered representative from the quantitative point of view.




