Hur blir en traditionell, tryckt ordbok en webbordbok?
The history of Ordbok över Finlands svenska folkmål (Dictionary of Swedish Dialects in Finland) began in the 1920s when the collecting of dialect data was initiated. The editing process started at the beginning of the 1960s, and four volumes were published between 1982 and 2007.
When the digitizing process began in 2010 the first two volumes (A–Hu), which only existed on paper, had to be scanned. They were converted into XML format using the structure indicators in the dictionary articles. The manuscripts for the next two volumes (Hy–Och) were compiled in a word processing program. Today the dictionary is compiled in a dictionary editing program. In the future, new articles will primarily be published online. A pilot consisting of the interval (I–K) will appear in 2013.
In our presentation we discuss difficulties which arise when a traditional printed dictionary is transformed into an online publication. We focus on features that can be used to parse the dictionary text into structured text files. Important questions are e.g.: What structure indicators are typographical? When do we have to pay attention to the content of the text to get proper tagging?
Our target is an electronic dictionary with flexible search possibilities that fully use all elements of the articles. We aim at serving both members of the ordinary dialectspeaking public and researchers in linguistics.
Nordisk Forening for Leksikografi/NSL og forfatterne.