Electronic Dictionaries viewed from South Africa

The aim of this article is to evaluate currently available electronic dictionaries from a South African perspective for the eleven ofﬁ cial languages of South Africa namely English, Afrikaans and the nine Bantu languages Zulu, Xhosa, Swazi, Ndebele, North ern Sotho, Southern Sotho, Tswana, Tsonga and Venda. A brief discussion of the needs and status quo for English and Afrikaans will be followed by a more detailed discussion of the unique nature and consequent electronic dictionary requirements of the Bantu languages. In the latter category the focus will be on problematic aspects of lem matisation which can only be solved in the electronic dictionary dimension.


Introduction
Lexicographers increasingly acknowledge the enormous potential of electronic dictionaries (EDs) and the piling up of such virtues dominated articles on this subject in the past decade. In a state-of-the-art article, De Schryver (2003: 163-187) lists no less than 118 advantages of EDs in terms of space and speed, graphics, audio, text corpora, multimedia corpora, accessibility, user-friendliness, etc. and many of these issues are discussed in detail by Prinsloo (2001), Bolinger (1990), Nesi (1999), Atkins (1996), Geeraerts (2000), Dodd (1989) and Harley (2000) to name but a few. The great capacity and speed characteristic of electronic products, combined with enhanced query and data retrieval technology, indeed pave the way to a new generation of dictionaries unimagined in the paper-dictionary era. It will not be attempted to discuss the advantages of electronic dictionaries over paper dictionaries in detail but rather to single out the typical innovative features listed in (1) which are relevant from a South African perspective.
(1) a. Pop-up access b. Bringing together of related items c. New routes to the data d. Less dependency on alphabetical order e. Fuzzy spelling f. Intelligent extrapolation of characters keyed in g. Audible pronunciation Such typical innovative features will simply be referred to as 'true' or 'real' electronic features.

Electronic dictionaries for English
As far as EDs for English is concerned the dictionary user in South Africa can benefi t from the full range of electronic dictionaries internation ally available such as Macmillan English Dictionary for Advanced Learn ers (MED), Oxford English Dictionary, Second Edition (OED on CD-ROM), Oxford English Dictionary (OED Online), Cambridge Advanced Learner's Dictionary Online (CALD), Collins COBUILD on CD-ROM, Merriam-Webster OnLine, etc. These dictionaries can be utilised to their full capacity in terms of true electronic features such as those given in (1). Whether online or on CD-ROM, such dictionaries present a new world of exciting electronic features. The discussion will be limited to a few outstanding features in a single online dictionary, the CALD and an ED on CD-ROM, the MED.
When MED is launched it immediately opens up on a random lemma which is automatically pronounced in British English and clickable options for both British and American English are provided. Audible pronun ciation is an excellent example of how the ED has superseded the paper dictionary. No phonetic transcription comes close to actually hearing, especially problematic phonemes, such as the click sounds in Bantu languages being pronounced. Furthermore the average dictionary user in South Africa is not familiar with phonetic symbols and the IPA orthography. Adding a feature such as the self-record function that can be selected from the menu bar, MED offers the ultimate guidance in terms of pronunciation that a dictionary can give to especially learners of the language. The user's pronunciation can be recorded, played back and compared to the master recordings for British and American Eng lish.
When the user starts to type the fi rst character(s) of the required lem ma in MED, continuous intelligent extrapolation of characters is attempt ed by the software. Say, for example, the user wants to look up the meaning of intoxication. Typing i, brings up the clickable lemma range i -Iberian, in triggers the range in -inaction while int returns int. -integrity and fi nally for into, the range into -intoxication is produced and the desired lemma can be clicked upon. Thus typing only 25% of the characters was required.
All     Figure 4 the search is conducted on a 'sounds like' basis. As for online dictionaries for English, a simple query for bank in the Cam bridge Advanced Learner's Dictionary Online returned extensive infor mation neatly organised into 33 clickable items representing senses, homonyms, etc. related to bank. What is additionally required, for English in the South African context, however, are EDs refl ecting South African English and most like ly in future what is called Black South African English. Silva (2004) states that South African English developed into a variety of English by assimilation of words and patterns from other South African languages. Dictionaries, and also EDs for English aimed at the South African market should refl ect such borrowings and patterns. A dictionary of South African English on Historical Principles Silva (1996) represents a landmark in this regard and is a valuable source for the compilation of a true ED of South African English. Wade (1998) lists a number of typical characteristics of Black South African English such as non-standard verb complementation, embedded questions and pronoun copying. He defi nes pronoun copying as instances where a noun phrase is followed immediately by a pronoun with the same referent, e.g. the parents, they are supposed to pay ten rands. For non-standard verb complementation he cites examples where make is usually followed by a 'to' infi nitive rather than a bare infi nitive as is illustrated in (2).
(2) Non-standard verb complementation (Wade 1998) a. What makes them to stop that product if there are people who do come to that shop and buy them.
b. So what will we… made you to come and buy.
c. That make the meaning to be different than other countries.
d. ELS makes the second language students to be able to adapt themselves to the university.
3. True electronic dictionaries versus paper dictionaries on computer that display some electronic features Sharpe (1995: 48), and Atkins (1996: 515-516), caution against a situation where electronic dictionaries simply use the content of printed dictionaries as their database thus not utilizing the potential of the electronic dictionary to the full.
… dictionaries of the present … may even come to you on a CDROM rather than in book form, but underneath these superfi cial modernizations lurks the same old dictionary. … Will the dictionary of the future simply blip its little electronic way off into the sunset dazzling its readers with the speed which it dishes up the same old facts on a technicolor screen? It is up to us to take up the real challenge of the computer age, by asking not how the computer can help us to produce old-style dictionaries better, but how it can help us to create something new… Atkins (1996: 515-516) Thus, in principle a clear decision should be made between EDs which are merely 'paper dictionaries on computer' and 'true electronic dic tionaries' which utilise advanced computer technology to offer functions such as those listed in (1) that is not possible in the paper dimension. Electronic dictionaries, for Afrikaans and the Bantu languages unf ortun ately fall to a large extent in the former category and much development towards the latter is still required.
For Afrikaans four electronic dictionaries, Elektroniese WAT (Electronic version of the Woordeboek van die Afrikaanse Taal) and Pha ros Woordeboeke Dictionaries 5-in-1 on CD-ROM and two online dic tionaries Travlangs and DDP Freeware will briefl y be evaluated in terms of true electronic features.
The Pharos Woordeboeke Dictionaries 5-in-1 offers Pharos' Major Dictionary, Bilingual Phrase Dictionary, New Words, Verklarende Afrikaanse Woordeboek and the Groot Tesourus van Afrikaans on a single CD-ROM. The virtues are maximally highlighted by the publisher as follows: 'Whether you need guidance on spelling, meaning, synonyms, ab brevia tions, English and Afrikaans usage or translations, these author itative reference sources can provide the answers. … Searches which would be time-consuming or even impossible with the printed versions can be accomplished quickly and easily in the powerful Logos Library System. … Do global searches across all fi ve books and view the results side by side on your screen. You can fi nd any given word in a matter of seconds. You can cross-reference easily, add your own user notes and copy-and-paste sections into your word-processor documents. Use * and ? wildcards to extend the scope of your search, to fi nd that word on the tip of your tongue or missing from a crossword puzzle, or when you are not sure how to spell a word.' http://www.nb.co.za/Pharos/phCatalogueDisplay.asp Even the fontsize is adjustable. All this is fi ne and surely offers added value but still does not offer any signifi cant electronic features. Even the front page, title page, table of contents, etc. are exact images of the paper version. The user might still prefer to rather use the paper ver sions instead of 'starting-up' the computer simply to look up a few words 'on screen'.
The Elektroniese WAT also offers certain advanced search functions and a number of cross-references, such as oëbank in (3) which is con venient ly hyperlinked to the reference address oogbank that is clickable in the article of oëbank: (3) Elektroniese WAT a. oë s.nw. Selde ook, geselstaal, oge. Mv. van oog. b. oëbank s.nw (ongewoon) Sien OOGBANK: Die oëbank het 'n lys van … It is good that WAT, unlike some other Afrikaanse dictionaries, did lemma tise oë 'eyes' which is an irregular plural for oog 'eye' and give a cross-reference to oog, where sound and elaborate treatment is offered. How ever, the reference address oog in the article of oë, even though it is an implicit reference, should be clickable. Since it is not, the user has to manually scroll to oog in some way which is not much better than paging around in the paper version. In a true electronic dictionary impli cit references, in fact, all words, as in the case of MED mentioned above, should be hyperlinked to the relevant lemma.
An excellent feature in the Elektroniese WAT is the 'hitlist' function which generates concordance lines indicating the applicable lemma in each case. In Figure 5, besonderhede 'particulars' is given in context with 5 words of co-text on either side and it indicates that besonderhede occurs in the articles of lemmas such as algemeen 'general', afdaal 'descend', etc.
Elektroniese WAT overdid protection against copying by not allowing the user to copy and paste even a single word. This is nullifying one of the advantages of the electronic dictionary i.e. that users can copy and paste small sections of, or even an entire article for academic writing pur poses. Here MED is a textbook example of how it should be done namely allowing the user not only to copy an entire article but also to automatically add the source reference.
(4) electronic ... adjective *** using electricity and extremely small electrical parts such as MICROCHIPS and TRANSISTORS: … © Macmillan Publishers Ltd. 2002 Elektroniese WAT also contains numerous untreated lemmas such as the examples given in Figure 6 reminiscent of a paper dictionary on computer. In an electronic dictionary treatment should be offered or at least clickable rerouting to the relevant lemma that is treated.

Figure 6: Untreated lemmas in Elektroniese WAT
The fact that WAT is currently in either paper or electronic format only completed up to the alphabetical stretch O in itself makes it less attractive than a full A-Z version would have been. Notwithstanding the short comings expressed above in terms of real electronic features, Elektroniese WAT remains a valuable source of information for Afrikaans.
Online dictionaries for Afrikaans generally leaves much to be desired since only a limited number of lemmas are offered and treatment is very limited. Consider (5) and (6)  Compared to CALD (Table 1) and Merriam-Webster online's extensive treatment (5) and (6) contains very limited information, not to mention that in the latter example the name of the target language is consistently misspelt as African instead of Afrikaans.

Electronic dictionaries for Bantu languagesessentials or 'nice-to-haves'?
The fact that compilers of dictionaries for Bantu languages increasingly experiment with electronic and especially online dictionaries is encourag ing. Unfortunately with a few exceptions, these dictionaries still offer little more than their paper counterparts or source dictionaries. Compare the following extract from the online Sesotho sa Leboa (Northern Sotho) -English Dictionary.

Figure 7: Online Sesotho sa Leboa (Northern Sotho) -English Diction ary
For the lemmas apea, buduša, moapei and tlokoma the dictionary offers only a number of translation equivalent paradigms. Thus no true elec tronic features such as those listed in (1) or added value to the paper dictionary it is based upon. However, since the paper version is mono-directional Northern Sotho English, English words cannot be look ed up. In its electronic version, English lemmas can be looked up since the software then merely collates, say, all entries containing the trans lation equivalent cook in (8). Thus a rather peculiar way of adding value, but signifi cant for the following reasons. Firstly, the on ly other Northern Sotho dictionary that contains more lemmas, the Groot Noord-Sotho Woordeboek (Ziervogel and Mokgokong 1975) is monodirectional Northern Sotho English/Afrikaans. Secondly, this diction ary as well as the New English Northern Sotho dictionary. (Kriel: 1985) is out of print for more than 10 years. Thus the online Sesotho sa Leboa (Northern Sotho) -English Dictionary can be regarded as the big-gest available dictionary in the direction English Northern Sotho, although it is a simulated direction.
For a number of words like sepela, in the second column of Figure 7, audible pronunciation is clickable. Ideally this option should be extended to all lemmas.
The Travlang Worldwide Travel Guides contain useful translation equiv alents and phrases and are clickable for pronunciation.
-thenga v. buy; purchase njenga-prefi x foll. by noun like; just as eThekwini loc. of iTheku in/at/to/from Durban… There is no doubt that the Bantu languages will benefi t from all the inno vative true electronic dictionary features such as those mentioned in (1) and illustrated by means of English electronic dictionaries such as MED. The real challenge for Bantu-language EDs, however, lies in a number of problematic lexicographic aspects characteristic of these languages mainly revolving around lemmatisation problems and very complicated grammatical systems. The core of the lemmatisation problem lies in a complicated derivational system in Bantu and such diffi culties are multiplied if the language has a conjunctive orthography. Verbs in Bantu languages combine with numerous affi xes. Van Wyk (1985: 87) cal culates that a single verb in Zulu for example can have up to 18 x 19 x 6 x 2 = 4,104 combinations. Compare the following extract from a set of derivations for the verb sebenza (verbal root = -sebenz-) 'work' in Table 2 generated from the Pretoria Zulu-Corpus (PZC) and a typical example of concordance lines for Zulu verbs occurring with the prefi xal cluster wayesezo-'he/she would have' in Table 3.  Table 2 lists the fi rst 30 occurences of the alphabetically sorted derivations of the verbal root -sebenz-in PZC. Note that this list does not even go beyond the fi rst section, Aba, in the alphabetical stretch A. Verb stems in Zulu for example almost always occur with one or more affi xes. Traditionally Zulu dictionaries follow a stem lemmatisation stra tegy. This means that the lemmasign for all words in Table 2 for example will be -sebenza and the stems indicated in boldface in Table  3 i.e. fi ka, thola, qala, sebenza and lahla. The target users of a Zulu diction ary, especially learners of the language, are confronted with such long orthographic words and cannot look them up in Zulu dictionaries un less they know what the stem is. Isolating the stem often requires advanced knowledge of the morphological system of the language and the prob lem becomes critical in cases where neither the lexicographer nor the user is able to identify the stem! See Van Wyk (1985) for a detailed dis cussion.
Lexicographers have struggled for many decades to solve this problem by means of a variety of lemmatisation strategies. Ziervogel and Mokgokong (1975) took an approach which can be labelled an enterthem-all-strategy according to which they physically tried to enter all derivations of verbs. Consider the following example of the deri va tions actually lemmatised by them for the Northern Sotho verb aga 'build' which refl ects 16 of the more than 30 possible suffi xal clusters/deriv ation modules. Although successful in terms of entering 'all' the derivations, fi nding the meaning of the word remains a problem for the user as is illustrated by means of dikagollišano in Table 5. Here the user fi rstly has to strip the suffi xes in order to fi nd the verb stem and its meaning and then to 'add' the semantic connotations in a cumulative way in order to fi nd the mean ing -thus up to 12 steps in total: In step 12 the user concudes that dikagollišano means 'the processes of causing each other to break down' -but it is an artifi cially constructed meaning and (s)he is still not sure that it is the right conclusion.
A second strategy employed by Kriel and Van Wyk (1989) can be labelled the regulate-them-in approach. Following this approach only verb stems are lemmatised and a complicated set of rules is designed and given in the users' guide to the dictionary. In theory it means that all deriv ations are catered for but in practice it boils down to exactly the same process as illustrated for dikagollišano in Table 5. Other efforts include so-called left-expanded article structures, where an article displaying a left-expanded structure can still maintain an undisturbed alignment of the lemma sign in the vertical macrostructural ordering, as in Table 6. The Zulu words in Table 6 are thus still lemmatised according to the stem principle, i.e. the root -hamb-in this example, but the full orthographic forms are given with vertical alignment on h-, within the alphabetical stretch H in the dictionary. Although this approach has certain advantages over strict stem lemmatisation, it does not exempt the user from the obligation to identify the stem. Similar problematic circumstances exist for the lemmatisation of nouns. As in the case of verbs, nouns occur with affi xes. Here the Zulu noun umuntu 'a human being' is preceded by na-'and' plus ngenga 'as, like' and a sound change a+u o has occurred. The user has to know that the na, and njenga should be stripped, the sound change reversed and to remove the class prefi x (u)mu-of the noun, in order to look it up under -ntu and add the semantic connotations back on similar to the process in Table 5 for dikagollišano.
Furthermore, apart from the problem of stem identifi cation, singularity and plurality in Bantu is indicated by prefi xes. This complicates lemma tisation in alphabetically ordered dictionaries since it is extremely redundant to lemmatise each noun twice, on singular and on plural in the dictionary.
A variety of lemmatisation strategies have been attempted for nouns such as stem lemmatisation, lemmatising singular forms supplemented by rules given in the front matter of how to convert plural to singular, lemmatising both singular and plural forms, lemmatising on the third letter of the word in an attempt to avoid the noun prefi x, etc. All these strategies have major disadvantages and are discussed in great detail in Prinsloo and De Schryver (1999) and Prinsloo (2000a and2000b).
As a fi nal example of a major lexicographic problem, this time on the level of complicated grammatical structures, the lemmatisation of copulatives in Northern Sotho can be cited. The English words is, am, are and be literally have hundreds of equivalents in Northern Sotho. Consider (9) as a tiny extract from the rules determining the formation of copulatives (Poulos and Louwrens 1994: 320-326) and Table 8 as  an example driven table of     In Table 8 not less than 34 copulative forms for 3 different copulative rela tions were given, covering only class 1. Multiplied by the roughly 20 dif ferent sets of concords for persons and classes in Table 1, this means rough ly 34 x 3 x 20 = 2,040 possible candidates for lemmatisation of the dy namic copulative. In a good Northern Sotho dictionary the lexicographer tries to maximal ly utilise all available strategies and structures such as sound treatment in dictionary articles, cross-references to the back matter and even cross-references to outside sources such as grammar books in order to assist the user to understand this complicated issue in Northern Sotho.

se -SC -be + CB ga -se -SC2 -a -ba + CB ga -SC2 -a -ba + CB Classes: ga -se -CP -be + CB ga -se -SC2 -a -ba + CB1 ga -SC2 -a -ba -CB Participial pos lst and 2nd person: SC -bilê + CB Classes: CP -bilê + CB neg. lst and 2nd person: SC -sa -ba + CB Classes: CP -sa -ba + CB
One cannot but conclude that lemmatisation of especially nouns, verbs and copulatives cannot be solved for Bantu languages in the paper dimension especially if an accessible, user-friendly dictionary for inexperienced learners of the language is the objective. The question is how can these lemmatisation problems in respect of e.g. verbs, nouns and complicated linguistic systems like the copulative be solved? The solution lies in the electronic dictionary dimension. Utilising a combination of, especially the electronic features listed in (1), i.e. popup access, bringing together of related items, new routes to the data, less dependency on alphabetical order, intelligent extrapolation, etc.
can be the answer. In practical terms, detailed morphological analysis and parsing of nouns and verbs, annotated corpora, huge frequency lists, etc. will be the required building blocks. Hundreds of thousands of words will have to be hyperlinked to their lemma signs in order to allow intelligent extrapolation as has been illustrated above for intoxication in MED. Stratifi ed/layered pop-up boxes in the case of complicated grammatical systems will have to be built as well as a compli cated network of cross-referencing. Consider Figures 9 -11 for typical suggested solutions for the lemmatisation of nouns, verbs and copulatives respectively.  In the case of nouns, the noun class system could be presented in an innovative but simplistic way. In Figure 9 the user looks up the word serurubele and fi nds the translation equivalents 'butterfl y, moth'. If (s)he now puts the cursor on structure in the information bar, a text box opens, not only refl ecting the total scope of the noun class system, but also putting the word itself within its appropriate position in the noun class system, namely class seven. In the right bottom box, typical occurrences of the lemma and its derivations in idioms and proverbs can be studied.
Keep in mind that all this is achieved by simply moving the mouse over different sections of the navigation bar. Thus, information boxes only appear if the user wants to see them.  Table  B Descriptive copulative: The relation is one of description, i.e. complement describes subject Click here for Complete Table  C Associative copulative: The relation is one of association, i.e. subject is associated with complement Click here for Complete Table   Indicative: Identifying 1ps (Nna) ke morutiši ga ke morutiši +prog.
Monna e sa le morutiši ga e sa le morutiši 2-18----------Click here for Complete For the copulative, layered, clickable options should be provided, thus presenting the user digestible sections while outlining the full scope of the complicated system.

Conclusion
It has been attempted in this article to give a perspective on electronic dic tionaries from a South African point of view. As far as English is con cerned one could conclude that South African users have the advantage of the availability of sophisticated internationally developed Eds, both on CD-ROM and online and that future developments should focus on extending the same level of sophistication to Eds cater ing for South African English and also for Black South African English. For Afrikaans progress has been made towards the compil ation of true electronic dictionaries and it is expected that a new gener a tion of Afrikaans Eds would include more advanced true elec tronic dictionary features. For the Bantu languages interest in the compilation of electronic dictionaries is picking up and the fact that success ful information retrieval is so heavily dependant on the elec tronic dimension, provides extra motivation for the compilation of Eds for these languages. The rate of development of Eds will also be infl uenced by external factors both internationally and locally. It re mains to be seen how fast the presumed gradual swing from paper dic tionary to elec tronic dictionary often advocated in publications on Eds will take place. In an African context the development and use of Eds will also be infl uenced by the rate of development of a dictionary cul ture, compu tational skills and access to computers and the internet. In the long run it is reasonable to expect that also in South Africa the elec tronic dictionary will overshadow the paper dictionary in the same way as the com puter has superseded the typewriter.