The Ideal Number of Lemmas in an Ideal Accounting Dictionary

Lemma lacunas in dictionaries are a traditional focus area for lexicographers, but the opposite problem, which we choose to call lemma fl ooding, has received very little attention. The study of this fl ooding could be relevant in order to save lexicographers spending thousands of hours producing dictionary entries which nobody reads. In Bergenholtz/Norddahl (2012) we showed that during a three-year period less than 33% of all dictionary articles out of 18 million dictionary consultations were consulted in a dictionary with 111,000 entries. We examined nine possible reasons why a given word might not be of interest to users and consequently could be ignored in order to avoid lemma fl ooding. We tried to demonstrate that while it is not possible to completely avoid lemma fl ooding, implementing a relatively simple rule could minimize it. But in reality the results were quite disappointing, because there were no clear rules or methods to avoid lemma fl ooding. Now we will try the same kind of analysis of log fi les for the English-Danish and the Danish-English Accounting Dictionaries. We see here that there are differences between different dictionaries (monolingual for English and Danish and bilingual for English-Danish and Danish-English). We will try to give some explanations, but must admit beforehand that we have not found satisfying explanations which could lead to a plan for future accounting dictionaries or other economic dictionaries thus avoiding the production of never used dictionary articles.


Introduction
If dictionary users look up a word in a dictionary and very rarely fi nd what they are looking for, they might stop using the dictionary.Critics of a dictionary very often focus on such problems.Therefore, it is no surprise that dictionary authors and researchers focus on how to avoid having any lemma lacunas.
However, very little attention has been given to unnecessary lemmas in dictionaries.From Bergenholtz/Norddahl (2012) we know that more than 66% of all articles in Den Danske Netordbog have never been consulted over a 3-year period.More precisely, we are speaking about 37,238 dictionary articles looked for, but also 74,254 dictionary articles not looked for a single time in 18 million lookups.It is expensive and time consuming to produce unnecessary lemmas which could have been used on more relevant tasks.
The lack of relevant lemmas in a dictionary is called lemma lacunas.But we do not have an appropriate word for the many unnecessary lemmas in dictionaries.We propose to use the term lemma fl ooding (Danish Lemmaoverfl od, German Lemmaüberfl uß).
For printed dictionaries lemma fl ooding is a problem for the user because it makes the dictionary big, heavy and more time consuming to consult.The ideal dictionary only contains the information which the user needs and nothing else.In an electronic dictionary, however, it is not a disadvantage for the user that the dictionary contains lemmas which the user does not need.It is only a problem for the lexicographer.Bergenholtz/Johnsen (2005) are sceptical about whether we can ever fi nd any systematical descriptions of never used dictionary articles: Furthermore, it is not possible to discern a distinct pattern on the basis of this examination, e.g. that certain types of words, such as semantic or orthographic variants, are never requested.A systematic description of the requested words compared with the non-requested words is thus not possible, and it remains unclear whether such an investigation would be of practical use to lexicographers.(Bergenholtz/Johnsen 2005: 139f) The results of Bergenholtz/Norddahl (2012) are similarly disappointing without any really good rules to avoid lemma fl ooding.Both studies focus on general language dictionaries.In this article we will focus on specialized accounting dictionaries to search for a pattern and see how big the lemma fl ooding problem is compared to general language dictionaries.

Explanations for lemma fl ooding
In Bergenholtz/Norddahl (2012) we have put forward nine explanations for lemma fl ooding in general language dictionaries:

No system
The possibility that there is no explanation for why some lemmas are looked up, while others are not (Bergenholtz/Johnsen 2005).This is also a possibility in accounting dictionaries.

Not relevant words
This category is especially relevant for specialized dictionaries including accounting dictionaries.If an accounting dictionary includes words which would be more suitable in a medical dictionary, this would be considered as lemma fl ooding.We did not fi nd such words in any of the accounting dictionaries that we analysed and therefore cannot explain any lemma fl ooding occurring in these dictionaries.

Words known by everybody
If the user already knows the answer he will never search for it.This could be true if everyone using accounting dictionaries was an expert.But students and others who do not necessarily know even basic accounting terminology also use accounting dictionaries.

Easily understandable composite
In the Danish language easily understandable composites occur very frequently.Since we found no such system in Den Danske Netordbog, it is unlikely that this should be the case in English.

Foreign words
We see the same search behaviours with foreign words.Which foreign words are searched for and which are not seems to be random.

Neologism
New words in the language are among the lemmas most searched for.They seem to never be among the zero look-ups.This is more a category to avoid lemma lacunas than lemma fl ooding.

Words not used anymore
Some words almost disappear from a language over time and this might be a reason for why they are not looked up.If the words are not used anymore, this could explain why they are not searched for.The accounting dictionaries used for this research do not contain words that are not used anymore.

Words from specialized fi elds
In a general language dictionary the user might not expect to be able to fi nd lemmas from a specialized fi eld and will therefore go directly to a more relevant dictionary.This category is not relevant for accounting dictionaries, since the lemmas are from a specialized fi eld and should always primarily contain words from the fi eld of accounting.

Infrequent words
In order to produce dictionaries lexicographers often use frequency, which is a bad criteria.
From our data it is also clear that this is not a criteria which can be used for specialized dictionaries either, since relatively common words such as "accepted", "appendix" and "defi ciency" have zero look-ups whereas a much more infrequent word such as "amortisation" has 88 look-ups.
We had the clear assumption that a much bigger part of the lemma stock would be relevant for the user with a specialized dictionary compared to a general language dictionary.But also that there might be a difference between monolingual and bilingual dictionaries.

Results
We have looked in the log fi les of the English Accounting Dictionary, the Danish Accounting Dictionary, the English-Danish Accounting Dictionary and the Danish-English Accounting Dictionary at Ordbogen.com.The results are compared with the results of Bergenholtz/Norddahl (2012).
In the article of Den Danske Netordbog we did not include the total number of look-ups and therefore it is not included in the lists below.As the total number of lemmas increases constantly in all the dictionaries, the number of entries is that of the date that we looked in the log fi les for the specifi c dictionaries, and the number is even higher today.As we can see, the results are only slightly better than the general Danish monolingual dictionary.59.57% of the lemmas have never been consulted which is quite disappointing for the lexicographers.However, the audience is primarily Danish and they might make more spelling errors which can explain some of the disappointing results.
In a search for a system as to which lemmas the users search for and which they do not we have generated a list of the 10 most searched for articles and 10 randomly selected articles with zero look-ups:  These are all quite normal words and it is therefore no surprise that they are looked up.It also underlines that the third possible explanation (words known by everyone) cannot be true for accounting dictionaries.
To see which lemmas the lexicographer could have avoided producing we need to take a look at which lemmas have zero look-ups: Again, we see very common words in the list.There seems to be no logic distinction between the list with most look-ups and the list with zero look-ups.From our previous explanations it seems that only the "no explanation" option is true.
Let us see how this compares to the Danish Accounting Dictionary:  This time, we see that the hit percentage has increased to 64.17 % which is a lot higher than both the English Accounting Dictionary (40.43%) and Den Danske Netordbog (33.40%).One explana-tion for the difference could be that the audience is Danish.Also, it seems that people more often fi nd what they are looking for in the fi rst place.Table 5.The 10 most searched for articles in the Danish Accounting Dictionary In the above list we see the 10 most searched for articles in the Danish Accounting Dictionary which look somewhat similar to the English Accounting Dictionary.Just like The English Accounting Dictionary there is no clear pattern as to which lemmas the users search for and which they do not.This much for the monolingual accounting dictionaries.Now we will take a fi rst look at bilingual dictionaries.We start with the English-Danish Accounting Dictionary:  There is a slight increase in the hit percentage but it is almost the same as the monolingual Danish accounting dictionary.Just like the English Accounting Dictionary the hit percentage might be infl uenced by the fact that we have a Danish audience.This can also be seen in the number of searches with a result which is under 50% of the total number of look-ups.When looking at the list of most searched for lemmas we can see that there are many similarities between the monolingual English accounting dictionary and the bilingual English-Danish accounting dictionary.It does not, however, contribute to fi nding any good explanation for how to avoid lemma fl ooding.Randomly selected zero look-ups do not give any suggestions for explaining the lemma fl ooding in this dictionary either.
The last dictionary we will look at is the Danish-English Accounting Dictionary:  Apart from a different order in the rank we can see that 8 out of 10 lemmas in the monolingual Danish accounting dictionary are identical with the Danish-English Accounting Dictionary.This suggests that we have the same lack of logical explanation as before.And as is the case with the other dictionaries, we still cannot fi nd a good explanation by looking at the list of zero look-ups.

Conclusion
The ideal dictionary gives the user the needed information and nothing else.So the ideal number of lemmas in an accounting dictionary would be to have no lemma lacunas and no lemma fl ooding.Traditionally, most lexicographers have only lemma lacunas in mind, but not the vast amount of wasted time and effort produced on lemma fl ooding, which in some cases may consist of more than 50% of all lemmas.Is it possible to avoid lemma fl ooding?No! Just like it is not possible to avoid lemma lacunas.Nothing from our research indicates that lemma fl ooding can be avoided.Neither does it seem that we can currently set up any good rules in order to avoid lemma fl ooding.
One way to handle the issue was used in Ordbogen.com'sDanish-English general language dictionary where in the beginning some degree of lemma lacunas was accepted during a beta pe-

Table 1 .
English Accounting Dictionary versus Den Danske Netordbog

Table 2 .
The 10 most searched for articles in the English Accounting Dictionary

Table 3 .
Randomly selected lemmas with zero look-ups in the English Accounting Dictionary

Table 4 .
The Danish Accounting Dictionary versus Den Danske Netordbog

Table 6 .
Randomly selected lemmas with zero look-ups in The Danish Accounting Dictionary

Table 7 .
English-Danish Accounting Dictionary versus Den Danske Netordbog

Table 8 .
The 10 most searched for articles in the English-Danish Accounting Dictionary

Table 9 .
Randomly selected lemmas with zero look-ups in the English-Danish Accounting Dictionary

Table 10 .
Danish-English Accounting Dictionary versus Den Danske Netordbog 77.32% is the highest hit percentage which we have seen to date.It is more than twice as good as that of Den Danske Netordbog.This is a much more satisfying number for the lexicographer but we do still have a relatively high number of lemmas which have never been used.The Danish-English language pair combination is also by far the most used combination of all the accounting dictionaries

Table 11 .
The 10 most searched for articles in the Danish-English Accounting Dictionary

Table 12 .
Randomly selected lemmas with zero look-ups in the Danish-English Accounting Dictionary