How Successful is the Mediation of Specialized Knowledge ? – The Use of Thinking-aloud Protocols and Log Files of Reverbalization Processes as a Method in Comprehensibility Research

Since the publication of Ericsson’s and Simon’s book Protocol Analysis: Verbal Reports as Data in 1984, thinking-aloud has found its way into the exploration of (interlingual) translation processes. To gain deeper insight into translation processes, the method of thinking-aloud has been combined with the use of the software TRANSLOG developed by Jakobsen and Schou. This software records (logs) all keystrokes and mouse clicks during writing processes as well as the time intervals between them without the user of this programme realizing this. In my paper, I will describe how the method of thinking aloud combined with the use of TRANSLOG can be used to determine the comprehensibility of non-instructive texts. It focuses on an experiment in which fi ve subjects were asked to optimize a popular science text using TRANSLOG. During this intralingual translation process, the subjects had to think aloud. The paper will focus on the method I used and present what it reveals about the comprehensibility of the popular science text. 1. The challenge: Measuring how successfully knowledge is mediated in non-instructive texts Whereas reliable results on how successfully knowledge is mediated in instructive texts can be obtained by usability testing (cf. Rubin 1994), the comprehensibility of non-instructive texts is hard to analyze. Methods such as the use of readability formulas take into account only aspects of what makes a text comprehensible or incomprehensible. The employment of my Karlsruhe comprehensibility concept (Göpferich 2001, 2002, 22006), which represents an extended and improved ver* Susanne Göpferich Karl-Franzens-Universität Graz Austria susanne.goepferich@uni-graz.at


The challenge: Measuring how successfully knowledge is mediated in non-instructive texts
Whereas reliable results on how successfully knowledge is mediated in instructive texts can be obtained by usability testing (cf.Rubin 1994), the comprehensibility of non-instructive texts is hard to analyze.Methods such as the use of readability formulas take into account only aspects of what makes a text comprehensible or incomprehensible.The employment of my Karlsruhe comprehensibility concept (Göpferich  2001, 2002, 2 2006), which represents an extended and improved ver-sion of the comprehensibility concepts by the Hamburg group of psychologists Langer, Schulz von Thun & Tausch ( 5 1993) and by Groeben (1982), has proved a reliable instrument in pre-optimizing non-instructive texts (cf.Göpferich 2 2006: 154-188), but cannot replace targetgroup-centered empirical research into text comprehensibility.Methods employed so far in this type of comprehensibility research are cloze procedures, questions on the texts whose comprehensibility is to be determined, and reproductions of such texts.These methods have the disadvantage, however, that they measure either only aspects of the texts' comprehensibility (e. g., the predictability of words and phrases that fi ll gaps, the comprehensibility of words or passages relevant for answering the questions asked) or merely their rough overall comprehensibility.Furthermore, some of these methods lead to a confusion of the concepts of comprehensibility and retainability (cf. the research summary in Göpferich 2 2006: chapter 4).In this paper, I will present a target-group-centered empirical method for determining the comprehensibility of texts which takes into account every detail of the texts to be analyzed, which is independent of the texts' retainability, and which does not only allow the researcher to detect where the texts are incomprehensible, but also where they are hard to understand and where they give rise to misunderstandings or a demand for further information which is not given in the texts.I call this method optimizing reverbalization using thinking aloud and log fi les.
Since the publication of Ericsson's and Simon's book Protocol Analysis: Verbal Reports as Data in 1984 (Ericsson/Simon 2 1999), thinking aloud has found its way into the exploration of writing processes (cf.recently Schindler 2004) and (interlingual) translation processes (cf. the special edition of Meta edited by Lee-Jahnke 2005).To gain deeper insight into these processes, the method of thinking aloud has been combined with the use of the software TRANSLOG developed by Jakobsen and Schou (1999).This software records (logs) all keystrokes and mouse clicks during writing processes as well as the time intervals between them without the user of this software realizing this.
I have adopted the thinking-aloud method combined with the use of TRANSLOG to investigate the comprehensibility of a popular science text on diabetes.For this purpose, fi ve subjects, which belong to the target group of the text, were given the text and asked to optimize it us-ing TRANSLOG.This process can be considered a type of intralingual translation.During this intralingual translation process, the subjects had to think aloud.
In my paper, I will give a detailed description of the methods employed and present what they reveal about the comprehensibility of the diabetes text.The methods can also be used to investigate the subjects' maxims and strategies in making the text more comprehensible, i. e., their writing and optimizing strategies. 1 Although I will make a few remarks on the latter, too, the focus of this paper will be on the comprehensibility of the popular science text (for results on the subjects' maxims and strategies cf.Göpferich forthcoming).

The methods employed
Five subjects, who belong to the target group of the popular-science text on diabetes reproduced in Appendix A, were asked to reverbalize this text in TRANSLOG in such a way that the result was optimally comprehensible for its target group.The target group is specifi ed as follows on the website where the text to be optimized appears: "These contributions provide basic information on diabetes mellitus for which no prior knowledge is required."(Deutsches Diabetes-Zentrum 2004; my transl.).During the experiment, the subjects had to think aloud ("level 1 verbalizations" according to Ericsson and Simon 1999: 79).

The subjects
All subjects were female and either students in the degree programme "Translation and Interpreting" at the Department of Translation Studies of the University of Graz, had graduated from this programme, or were lecturers there.Their mother tongue was German; four of them were Austrians and one (YG) came from Switzerland.A short description of the subjects' educational and professional background is given in Ta-1 By maxims I mean goals of action the subjects strive for at a specifi c point in the optimization process.The term strategy is used in the sense of Faerch/Kasper (1983: 36), who defi ne strategies as "potentially conscious plans for solving what to an individual presents itself as a problem in reaching a particular communicative goal".This defi nition is also adopted by Krings (1986: 175).Thus maxims are targets, strategies, potentially conscious paths which are believed to lead to these targets.ble 1. Table 2 provides information on their physical and psychological condition during the experiment as described by themselves.
Although all subjects belong to the target group of the text to be analyzed, they are not representative of this target group.As can be seen from Table 1, their education and training is certainly above that of the average reader.This means that whatever is incomprehensible for them can also be regarded as incomprehensible for the intended readership in general.Since the subjects were asked to optimize the text for the intended readership (and not only for themselves), they may have optimized sections of the text that they have found comprehensible for themselves but considered incomprehensible for people with lower education.Their judgement on such elements of the text must be considered to be speculative.To reduce subjectivity, I have taken into account how many of the fi ve subjects judged elements of the text to be optimized incomprehensible or hard to understand.We must not forget, however, that people with a lower educational standard might fi nd additional things incomprehensible that were not criticized or optimized in the experiment.
Selecting subjects with higher education and with at least some experience in translation has the advantage that their meta-linguistic and meta-communicative competence allows them to give a more detailed and precise description of their comprehension problems and optimizing maxims and strategies than persons with no education and training in this fi eld.Thus, their meta-linguistic and meta-communicative competence makes their thinking-aloud protocols more illuminating.Although what these subjects' thinking-aloud protocols reveal about text comprehension may not be representative of the entire target group, it can at least provide questions, which can be used to fi nd out how well subjects with lower educational standards have understood certain passages of the text.

The assignment
Each subject had to reverbalize the text in Appendix A in such a way that it would be tailored to the requirements of its intended readership.Passages that the subjects considered perfect could be copied into the target version.Prior to the actual experiment a trial run with a different text was carried out to acquaint the subjects with the functionality of TRANSLOG (editing functions and TRANSLOG dictionary).Only after all the questions on the software and the test setting had been answered was the actual experiment begun.The assignment was explained to the subjects by the supervisor and also handed out to them in writing (cf.Appendix B).The dictionary entries provided with the source text in TRANSLOG are reproduced in Appendix C. No other material could be used during the experiment.
During the experiment, each subject was sitting in a quiet room2 together with a supervisor.The subjects wore headsets; their verbalizations were recorded with the freeware AUDACITY and exported in MP3 format.The recordings were transcribed according to the GAT conventions for basic transcripts (Stelting et al. 1998) and then proofread by at least one other person.The complete transcripts of all fi ve subjects can be downloaded as a PDF fi le from Göpferich (2005).During the experiment the subjects were not put under time pressure (cf.Table 2).They were informed that what was analyzed in the experiment was not their competence but the comprehensibility of the diabetes text.After the optimization process, the subjects were asked whether they had any questions for a diabetes specialist which had cropped up during the experiment and which they could not answer using the information in the text and the TRANSLOG dictionary.These retrospective interviews were recorded and transcribed too.They can be found at the end of each of the transcripts in Göpferich (2005).

The data
Apart from the information given in Table 1 and Table 2, the experiment provided the following data: 1. the optimized versions of the diabetes text, 2. the log fi les, 3. the thinking-aloud protocols (TAPs) as well as the protocols of the retrospective interviews (RIPs; cf.Göpferich 2005).

Data analysis
The data were analyzed as follows: 1.Each subject's version was compared to the original text.Passages in which changes had been made were numbered and juxtaposed to their original version in a table.For each change the comprehensibility dimension in which this change occurred according to the Karlsruhe comprehensibility concept3 (Göpferich 2001;2002; optimized itself and the TRANSLOG dictionary), and which they therefore would have liked to ask a specialist were collected.If we start from the assumption that an optimally comprehensible text does not give rise to questions in the reader's mind that it does not answer, these remaining questions are additional reliable indicators of defi ciencies with regard to comprehensibility.This also applies to dictionary look-ups.
5. For each element of the original text that had been subject to criticism or questions in the experiment, I determined how many subjects had criticized it or had questions on it (cf.Table 3 in section 4).
The more subjects commented on it, the higher the probability that it may really lead to comprehension problems.
6.An optimized version was written in which the criticism of all subjects was taken into account and which answers the questions they had.In this optimized version, only real improvements suggested by the subjects were considered.If a subject formulated a maxim for a specifi c section of the text without providing a solution fulfi lling this maxim, I tried to provide such a solution by myself.Deteriorations and 'cosmetic' changes were ignored.Linguistic mistakes made in the source text or by the subjects were corrected in the optimized version (cf.Appendix A).
7. The subjects' maxims and strategies were analyzed and classifi ed.
For each subject the repertoire of maxims she had and strategies she used were determined.From these results, conclusions for text production didactics can be drawn.The subjects' maxims and strategies, however, are beyond the scope of this paper.I will mention some of them here; a detailed analysis of them will be provided in Göpferich (forthcoming).

Results
In the following, I will give a survey of all the elements of the original text that either one or more subjects in the experiment considered hard to understand or incomprehensible as well as of the questions the text gave rise to in the subjects' mind without providing an answer.
For each element, extracts from the TAPs and/or RIPs will be quoted which show that the subject(s) found it diffi cult and, if applicable, what maxims and strategies they followed to improve the corresponding passage.A distinction is made between a) completely incomprehensible elements of the source text and passages giving rise to questions that are not answered in the text and b) passages which are simply hard to understand.
She solves the problem by adding "im Allgemeinen" (In general, a distinction between two types of diabetes is made.)and referring to gestational diabetes as a "Spezialfall", a special type of diabetes.The log fi le reveals that she fi rst puts down "zwei Typen" (two types of diabetes), then changes this into three types, and uses "zwei Typen" again in her fi nal version.In the retrospective interview (RIP YG 476-482), she says that she would like to ask a specialist about this because she is still not sure which one is the correct version.
To my mind, the real problem in the text here is that the author informs us about what happens in the bodies of patients with type 1 diabetes and type 2 diabetes, but not about what happens in the bodies of women with gestational diabetes.If this information were given, it would be clear that gestational diabetes is neither a variant of type 1 nor of type 2. In contrast to type 1 diabetes, which occurs when the body produces too little or no insulin, and type 2 diabetes, which occurs when the body cannot use the insulin it produces, gestational diabetes is caused when pregnancy hormones and hormones produced by the placenta lead to such an increase in the blood glucose level that the pregnant woman's pancreas can no longer compensate for this by an increased insulin production.Adding this information together with the coherence increasing elements in general (im Allgemeinen) and a special type of diabetes (ein Spezialfall der Zuckerkrankheit) eliminates this incomprehensibility. 3.For YG the last sentence in the text "Dabei ist jedoch das Risiko für die spätere Entwicklung eines Typ 2 oder Typ 1 Diabetes [sic] stark erhöht."(In this case, there is a considerably higher risk of developing a type 2 or type 1 diabetes afterwards.)gives rise to the question whether this refers to the mother or the child: "beim kind oder bei der mutter?" (TAP YG 34).That it refers to the mother can easily be made more explicit here.

Some
4. For JS (cf.TAP JS 493-516) the original text does not make clear whether type 1 diabetes is caused only by a combination of all three factors mentioned in the text (i.e., fi rst, genetic predisposition, second, infl uences from outside such as certain virus infections, and third, a disorder of the immune system) or whether it may be caused by one or two of these factors alone.
Although the combination of all three factors seems more plausible to her, she combines the factors by or in her optimized version, which shows that she is still not sure.Since the factors that cause the disease are an important information for the patient, the text must be made more explicit here.

What makes the text hard to understand
1. Since Diabetes mellitus is a specialized term for which there is the more general and thus comprehensible expression Zuckerkrankheit in German, YG, SF, and EK feel that this more comprehensible designation should be added already in the title of the article (cf.TAP SF 41-44, cf. also .This makes sure that the reader knows what the text is about from the beginning.
EK does not comment on this, but she too makes the relation between Diabetes mellitus and Zuckerkrankheit explicit ("Diabetes mellitus ist der Fachausdruck für Zuckerkrankheit").
3. YG wonders about the term Defi nition in the title because to her mind the text provides more information on diabetes mellitus than just a defi nition (cf.TAP YG 344-349), so that the term does not fi t.Furthermore, one may object that the term Defi nition is a hard word for many readers, which is a second reason for taking it out.
(In spite of her objections, YG takes the term over in her fi nal version.)EK deletes the term Defi nition in her title, but does not comment on this.
4. The subjects YG and NL wonder whether "ist gekennzeichnet durch" is the correct expression in the defi nition of diabetes mellitus or could be replaced by a simpler verb.Both of them feel that the latter is the case (cf.TAP NL 44-82; TAP YG 568-632).
6. Four of the fi ve subjects (SF, YG, NL, EK) are amazed that the term Zuckerspiegel (sugar level) is used in the plural; they have only heard of der Zuckerspiegel in the singular and wonder whether there are several sugar levels (cf.TAP YG 220-223; cf. also TAP SF 320-329).In fact, there is only one blood sugar level, so that the plural is wrong and must be changed into a singular.Even if there were several blood sugar levels, using the singular would be the option to be preferred in this context because a distinction between different sugar levels is not necessary in the text and wondering about the plural requires memory capacity which will then not be available for processing the central information on diabetes.If a differentiation were relevant, the plural should be introduced explicitly so that the reader need not wonder about it.YG opts for the plural because, as she says in the retrospective interview (RIP YG 498-507), the author of the text is an expert and should know what he is talking about.This is also the reason why SF uses the plural (cf.TAP SF 326-329).NL and EK prefer to use the more common singular (cf.TAP NL 264-268; TAP EK 209-210).
7. In the original version, information which belongs closely together such as a) the different designations of each type of diabetes, b) the age when type 2 diabetes occurs, c) the beginning and the end of gestational diabetes, and d) the fi rst mentioning of the destruction of the insulin producing cells and the detailed explanation of how they are destroyed and why are given at different places in the text.YG brings some of them together (a and c), however, without commenting on it in her TAP ('information clustering maxim').EK brings the information under a) together (TAP EK 69-71).NL explicitly states that she wants to bring the information when type 2 diabetes occurs together (cf. .SJ comments on d) saying, <<sich selbst beim Tippen diktierend> beruht .hhauf einem (.) mangel an insu .hhhhh (--) infolge (-) einer zerstörung> .hhhhh (2.0) der sogenannten bet (2.0) ah das klingt glaub i komisch, wenn das da hinten so (.) .hder insulin produzierenden zellen (2.0) .hhja aber wodurch werden die zerstört?(2.0) <<den Ausgangstext lesend> diese zellen gehören zur bauchspeicheldrüse und sind ein bestimmter typ der sogenannten inselzellen.am höchsten ist die neuerkrankungsrate bei kindern,> (2.0) ah=so (.) do unten steht des erst (.) ha?This comment, too, shows that information belonging closely together should be provided together.
8. Without mentioning this explicitly, YG seems to follow a parallelism maxim.This can be seen in her optimized version, where she tries to structure the sections on the three types of diabetes in parallel (age groups and designations, disease itself, causes, effects).4NL follows the parallelism maxim explicitly stating twice that she wants to structure the information on type 2 diabetes in the same way as that on type 1 diabetes (cf. . Parallelism has been propagated by stylistics for a long time.For the reader, it increases the predictability of what comes next in a text and therefore is important for text comprehensibility.9.In the original text, Beta-Zellen (beta cells) is introduced as the specialized term for Insulin produzierende Zellen (insulin producing cells).YG decides not to eliminate this alternative designation from the text because it might be used in doctor-patient conversations (cf.TAP YG 152-156), but she introduces it only once and then -in contrast to the author of the original text -goes on using the more telling expression Insulin produzierende Zellen, which I consider a good decision.
10. Several of the subjects (SF, YG, NL) wonder about the noun Untergang (decline).YG laughs when she reads it (TAP YG e. g. 186-in folge davon kommt es zu einem unter <<f> untergang der insulin produzierenden zellen> (3.5) also des kann i ma a net gut vorstellen (.) untergang der (---) zellen (3.0) da hab ich gleich diese assoziation dass (.) diese zellen ausschauen wie schiffe und irgendwie attakiert werden und dann .hhsinken; und untergehen.was kann damit gemeint sein?=is da irgendwas im wörterbuch?((schlägt im Wörterbuch nach)) natürlich nicht.This unintentional foregrounding of linguistic elements takes away memory capacity necessary for processing the content on diabetes and therefore should be avoided and, as NL's TAP shows, be replaced by a semantically more precise formulation.NL says, also untergang des klingt für mich komisch.das führt zu einem verlust an, .hdadurch werden die insulin produzierenden zellen (2.0) geschädigt, oder beschädigt, oder werden sie wirklich zerstört?.hhdas würd ich jetz gern noch an fachmann fragen (-) was dann wirklich mit diesen zellen passiert.(3.5) ob einfach die anzahl reduziert wird, oder ob sie beschädigt und daher funktionsunfähig sin, (---) oder ob sie wirklich völlig zerstört werden.das müsste man noch mal fragen.11.The original text says that type 2 diabetes, also called Altersdiabetes (old-age diabetes), occurs after age 40 in most cases, which is the reason for its designation.In recent years, however, people have been affected by this type of diabetes at an ever earlier age.For YG age 40 is not really old: "<<len> vierzge isch jo (.) no net (.) speziell: (-) alt.>" (TAP YG 404-405).Therefore she feels that the designation is not motivated semantically and wonders whether type 2 diabetes occurred later than age 40 in the past and the age has gone down to 40 only in recent times (cf.TAP YG 252-260).Since YG is the only of the fi ve subjects who had this problem, this is not taken into account in the optimized version.
She solves the problem by making explicit that the body cannot use the glucose in the increased blood sugar level and therefore has to fall back on its fatty tissue (and protein reserves, which is not mentioned in the original text) instead, which, to my mind, is an excellent solution.
15.For EK the explanation of type 2 diabetes is not clear.She wonders "was bedeutet ein vermindertes ansprechen"?(TAP EK 307).In the retrospective interview she wants to ask an expert about this: unter der frage was versteht man unter dem .htyp zwei diabetes genau und wodurch wird er hervorgerufen ist jetzt als vage; also das müsst ma noch einen experten fragen und (--) .hhklären; hh was genau der typ zwei diabetes ist.16.NL wonders why type 1 and type 2 diabetes are not mentioned in the usual order in the last sentence: <<sich selbst beim Tippen diktierend> dIabetes (---) typ> (--) warum is des (1.5) zuerst des typ zwei und dann des typ eins (2.0) warum steht da des (.) zuerst?(3.0) na ja wahrscheinlich weil (-) sobald eine frau im (1.5) gebärfähigen alter is, ist es wahrscheinlicher dass sie erst später an diesem altersdiabetes erkrankt (6.0) größer (4.0) ah, trotzdem fi nd ichs irgendwie seltsam zuerst zwei und dann eins zu schreiben (4.0) ((tippt)) (4.0) ((tippt)) so.(2.0) ah (-) das risiko später an typ eins oder zwei diabetes zu erkrANken gehört da natürlich noch rein (TAP NL 456-465) The unusual order leads to a foregrounding of this information as can been seen from NL's refl ections.If there is a reason for the unusual order, this reason should be given explicitly.In fact, there is a reason: The risk of being affected by type 2 diabetes after a pregnancy is higher than being affected by type 1.If no reason is given, the usual order should be used (cf. the solution in the optimized version in Appendix A).

Summary and conclusions
Table 3 gives an overview of the problematic elements in the source text, of how many subjects had problems with them, and of a few maxims and strategies used by the subjects to solve these problems.The criticism summarized in Table 3 can be used to produce a version optimized on an empirical basis.Such a version is juxtaposed to the original version in Appendix A. In this optimized version, all elements criticized have been changed except for three items which were criticized by only one subject and in an unconvincing manner: T-Lymphozyten, Körperzellen, and the semantic motivation of Alterdiabetes.
Ideally, this optimized version should again become the object of optimizing reverbalization, which is an iterative method, until no further defi ciencies can be recognized.
If we compare the insight into text comprehensibility the method of optimizing reverbalization described here gives us with the results obtainable with readability formulas, cloze procedures, questions on texts whose comprehensibility is to be assessed, or text reproductions (cf. section 1), it becomes obvious that the method described here provides much more differentiated and reliable results.At the same time, however, it is also much more time-consuming.
The subjects' criticism of the diabetes text and their maxims and strategies, of which only a few could be mentioned here, reveal what the 'ingredients' of comprehensibility are to their mind.A comparison of these 'ingredients' of comprehensibility with the six dimensions of the Karlsruhe comprehensibility model reveals that there were no items of criticism that could not be attributed to one of the dimensions of the Karlsruhe model.This shows that the comprehensibility concept underlying the Karlsruhe model seems to match the intuitive comprehensibility concepts of the subjects in the experiment.
The method described in this article, however, may also be used to refi ne the Karlsruhe model in the following way: First of all, a larger number of texts has to be analyzed by means of the method described here.Then the items which make the texts incomprehensible or hard to understand have to be classifi ed.Each category of this classifi cation must then be subsumed under one of the six dimensions of the Karlsruhe model, which will then represent the factors which determine text comprehensibility in much more detail.Such a refi ned Karlsruhe model would then be an even better framework of orientation for producing more comprehensible texts and at the same time provide the criteria for comprehensibility assessments which are less time-consuming than the method described here, but provide more detailed results than the methods employed in the past.

Table 1 :
The subjects' educational and professional background

Table 3 :
Defi ciencies of the original diabetes text with regard to comprehensibility