Testing the Effort Models ’ tightrope hypothesis in simultaneous interpreting-A contribution

In a sample of 10 professionals interpreting the same source speech in the simultaneous mode, errors and omissions (e/o’s) were found to affect different source-speech segments, and a large proportion among them were only made by a small proportion of the subjects. In a repeat performance, there were some new e/o’s in the second version when the same interpreters had interpreted the same segments correctly in the first version. These findings strengthen the Effort Models’ “tightrope hypothesis” that many e/o’s are due not to the intrinsic difficulty of the corresponding source-speech segments, but to the interpreters working close to processing capacity saturation, which makes them vulnerable to even small variations in the available processing capacity for each interpreting component. 1. The nature of the Effort Models In the seventies, models based on the information-processing paradigm (Gerver 1975, Moser 1978) were developed to account for the mental operation of simultaneous interpreting. More recently, Setton (1997), Paradis (1994) and Mizuno (1994, 1995) have developed their own models, similarly based on cognitive science. All these are valuable as theoretical constructs insofar as they take on board relevant developments in cognitive psychology, neurolinguistics and linguistics. On the practical side, however, over the past two decades, they have not been subjected to much systematic testing, probably due to the complexity of the mental operations involved (see for instance Lambert 1995, MoserMercer 1997:2, Massaro and Shlesinger 1997:14, Frauenfelder and Schriefers 1997:55) and to the lack of institutional, human and financial resources for such exploration. 153 Hermes, Journal of Linguistics no. 23 1999 * Daniel Gile Université Lumière Lyon 2 Home address: 46, rue d’Alembert F-92190 Meudon DGile@compuserve.com In the early eighties, a set of models were developed in a different mindset, the idea being not to describe the simultaneous interpreting process, but to account for errors and omissions observed in the performance of simultaneous and consecutive interpreters which could not be easily attributed to deficient linguistic abilities, insufficient extralinguistic knowledge or poor conditions in the delivery of the source text. These ‘Effort Models’ (see for example Gile 1995, 1997) pool together operational components of interpreting into three ‘Efforts’, namely: L the Listening and analysis Effort P the Production Effort (speech production in simultaneous, and note production during the first stage of consecutive while the interpreter is listening, but not interpreting yet) M the short-term Memory Effort essentially dealing with memory operations from the time a speech segment is heard to the time it is reformulated in the target speech or disappears from memory. The Effort Models (EM) are models of operational constraints, not architectural models, insofar as they do not postulate a particular mental structure and information-processing flow, as is the case of the other models mentioned above. The underlying idea is that with minimal assumptions about cognitive architecture, it is possible to come up with a set of models with explanatory and predictive potential on the level of actual interpreting performance. Therefore, contrary to a widespread paradigm in cognitive science, their testing and development can focus on their validation as operational tools, rather than on architectural validation and component and/or flow additions and corrections. Moreover, because of their distinct nature and objectives, they are not in direct competition with architectural models, at least as long as architectural models cannot make operational predictions. However, though they have deliberately been designed at a holistic level without going into fine-grained architectural analysis, they are based on cognitive concepts (in particular the concept of limited attentional resources and the assumption of a strong correlation between task difficulty and taskimplementation duration), and any cognitive findings invalidating their basic assumptions will have to be taken on board. The most fundamental architectural assumption in the Effort Models is that in spite of the sharing of some cognitive resources, and in partic154


The nature of the Effort Models
In the seventies, models based on the information-processing paradigm (Gerver 1975, Moser 1978 were developed to account for the mental operation of simultaneous interpreting. More recently, Setton (1997), Paradis (1994) and Mizuno (1994Mizuno ( , 1995 have developed their own models, similarly based on cognitive science. All these are valuable as theoretical constructs insofar as they take on board relevant developments in cognitive psychology, neurolinguistics and linguistics. On the practical side, however, over the past two decades, they have not been subjected to much systematic testing, probably due to the complexity of the mental operations involved (see for instance Lambert 1995, Moser-Mercer 1997:2, Massaro and Shlesinger 1997:14, Frauenfelder and Schriefers 1997 and to the lack of institutional, human and financial resources for such exploration.
In the early eighties, a set of models were developed in a different mindset, the idea being not to describe the simultaneous interpreting process, but to account for errors and omissions observed in the performance of simultaneous and consecutive interpreters which could not be easily attributed to deficient linguistic abilities, insufficient extralinguistic knowledge or poor conditions in the delivery of the source text. These 'Effort Models' (see for example Gile 1995Gile , 1997 pool together operational components of interpreting into three 'Efforts', namely: L -the Listening and analysis Effort P -the Production Effort (speech production in simultaneous, and note production during the first stage of consecutive -while the interpreter is listening, but not interpreting yet) M -the short-term Memory Effort essentially dealing with memory operations from the time a speech segment is heard to the time it is reformulated in the target speech or disappears from memory.
The Effort Models (EM) are models of operational constraints, not architectural models, insofar as they do not postulate a particular mental structure and information-processing flow, as is the case of the other models mentioned above. The underlying idea is that with minimal assumptions about cognitive architecture, it is possible to come up with a set of models with explanatory and predictive potential on the level of actual interpreting performance. Therefore, contrary to a widespread paradigm in cognitive science, their testing and development can focus on their validation as operational tools, rather than on architectural validation and component and/or flow additions and corrections. Moreover, because of their distinct nature and objectives, they are not in direct competition with architectural models, at least as long as architectural models cannot make operational predictions. However, though they have deliberately been designed at a holistic level without going into fine-grained architectural analysis, they are based on cognitive concepts (in particular the concept of limited attentional resources and the assumption of a strong correlation between task difficulty and taskimplementation duration), and any cognitive findings invalidating their basic assumptions will have to be taken on board.
The most fundamental architectural assumption in the Effort Models is that in spite of the sharing of some cognitive resources, and in partic-ular long-term memory, there are enough unshared components in interpreting to justify the distinction between the three Efforts, one revolving around comprehension, one around production of speech (in simultaneous) or notes (in the first stage of consecutive), and one around short-term memory operations. The fact that in simultaneous, comprehension is in one language and production in another, and in consecutive, production refers to producing notes, is one element which justifies separation of these two Efforts. The definition of the Memory Effort as a distinct component is less evident, since both sensory memory and working memory come in both in comprehension and in production (in the latter, for self-monitoring). Thus, it could be argued that a two-element model composed of a comprehension phase and a production phase would be sufficient and more representative of actual cognitive architecture. The case for a distinct place for the Memory Effort in the model rests on the following: a. The necessary co-existence in short-term memory (including sensory memory and working memory) of source-speech elements and target-speech elements in simultaneous interpreting, and of sourcespeech elements and written representations of words and concepts in the first stage of consecutive, is likely to induce in the interpreter's short-term memory some operations, and possibly some architectural links and components (separate stores for the source speech and the target speech, inhibition and activation, links with the mental lexicon in one language or another, etc.), which are not usual in the non-interpreting listener's or speech-(or note-) producer's memory.
b. In strategic terms, interpreters make specific decisions on their EVS (Ear-Voice Span, or how much they lag behind the speaker) on the basis of memory-capacity limitations beyond the automatic memory operations that occur in speech comprehension and speech production. These limitations and strategies have been a distinct topic of reflection and discussion since the sixties -see inter alia Fukuii and Asano 1961, Oléron and Nanpon 1965, Kade and Cartellieri 1971, Daró and Fabbro 1994, Osaka 1994, Padilla 1995, Gran and Bellini 1996, Chincotta and Underwood 1998 Besides this architectural assumption, there are three major operational assumptions: a. Each of the three Efforts has non-automatic components. Therefore, all three require attentional resources.
From cognitive psychology and psycholinguistics research, it is known that speech comprehension and speech production under ordinary conditions include non-automatic components. In simultaneous interpreting, there is no reason to assume that speech comprehension is more automatic than in ordinary conditions -while there are many to assume the opposite. As to speech production, it would become automatic only if it involved automatic word-for-word replacement, which is clearly not the case. As to the short-term memory effort, it is nonautomatic insofar as it involves storing and retrieving ever-changing information elements (for a more detailed discussion of the non-automatic nature of the three Efforts, see Gile 1995). This first operational assumption can therefore be considered unproblematic.
b. The three Efforts are at least partly competitive, meaning that even if they share resources and may be somewhat cooperative, the net result of their coexistence will usually be an increase in processing capacity requirements (the 'competition hypothesis').
In mathematical terms, this 'competition hypothesis' can be represented in the following way, with the total processing capacity consumption TotC associated with interpreting at any time represented as a 'sum' (not in the pure arithmetic sense) of consumption for L, consumption for M and consumption for P, with further consumption for 'coordination' (C) between the Efforts, that is, the management of capacity allocation between the Efforts: (1) TotC = C(L) + C(M) + C(P) + C(C) and (2) TotC ≥ C(i) + C(j) i,j = L, M, P and i different from j (Where -equation (1) represents the total processing capacity consumption -inequality (2) means that each of the three Efforts requires some processing capacity -inequality (3) means that the total capacity consumption is at least equal to that of any single Effort -inequality (4) means that the total capacity consumption is at least equal to that of any two Efforts performed in conjunction (in other words, adding a third Effort means adding further capacity consumption).) The competition hypothesis is generally accepted intuitively by practitioners, is explicit in many anecdotal accounts of difficulties encountered by interpreters, and has not generated any criticism when presented to cognitive scientists at various interdisciplinary meetings. However, it has not been systematically tested, and it cannot be ruled out that in some specific cases, it does not hold.
c. The idea that most of the time, interpreters work near saturation level (the 'tightrope hypothesis'). The present study is a partial test of this third hypothesis.

Previous theorizing and testing
On the basis of the Effort Models, some further theoretization was possible. Firstly, the existence of 'problem triggers' was hypothesized, in particular speech segments or tasks requiring heightened attentional resources.
The assumption was that if indeed interpreters work near saturation level, even limited additional attentional requirements could lead to failure. Another hypothesis was that speech segments with low redundancy were also problem triggers, since they had low tolerance of attentional lapses such as might occur because of attentional mismanagement. In a simple interpretation task, Gile (1984) found that there was a high rate of failure in rendering proper names, some with low morphologic redundancy ("Cliff"), and some with heightened attentional requirements ("Pacific Islands Development Commission"). Conversely, low density segments in a speech can lower cognitive pressure. Examples of such low density segments are pauses, on which Barik and Goldman-Eisler focused (see Gerver 1976), but also some language-specific constructions. In Japanese, for instance, Gile (1992) found frequent 'predictable sentence endings' of up to more than 10 syllables in length. In terms of the Effort Models, such a pressure drop over more than one second could lead to specific interpreting strategies when interpreting from Japanese. Unfortunately, no on-line testing of this potential effect could be done at this stage.
A further assumption developed from the EM was that these triggers could generate failures at a distance, when attentional resources were diverted from one Effort to another where 'reinforcement' was necessary, thus 'saving' one speech segment but jeopardizing an ulterior segment in a 'failure sequence' (Gile 1995). In a simple interpretation task with students, Granacher (1996) listed potential failure triggers he identified in the source speech and looked at the target speech rendition of these speech segments and the following segments, trying to detect and explain failures in terms of triggers and 'failure sequences'. In most cases, he did find errors and omissions and concluded that triggers were probably involved. However, the design of the experiment was too loose to allow robust conclusions to be drawn: much of his inferencing was speculative, and retrospective reports by subjects did not confirm the conclusions explicitly.
The Effort Models also predict higher attentional requirements when working from syntactically different languages. Starting with this assumption, Dawrant (1996) hypothesized that this would lead to specific processing-capacity saving strategies in interpreting. He identified Chinese constructions requiring word-order modification when interpreting into English and compared strategies used in interpreting speeches with one such construction, the coverb structure, to strategies used when translating them rapidly in writing. His findings suggest that indeed, "interpreters seek to avoid the increased working memory load associated with the rearrangement of word order in SI through the use of the Processing Capacity-conserving strategies of anticipation and 'linearity'" (p.84).
In a recent doctoral dissertation, Lamberger-Felber (1998) also tested a number of hypotheses which she derived from the Effort Models regarding different types of errors and omissions (in numbers and proper names, "serious meaning errors", omission of long segments) through a three-condition experimental set-up: when interpreters were given the manuscript of the speech in advance with time for preparation, without time for preparation, and when they were not given the manuscript. Her findings confirmed expectations regarding the usefulness of manuscripts in lowering processing capacity requirements.
From a slightly different angle, the EM for consecutive predicts a disruptive effect of note-taking in untrained interpreters for the following reasons: -extra processing capacity is involved in deciding what to write and how to write it -extra processing capacity is involved in controlling the writing operation -writing generally takes much longer than uttering the same speech segment, hence a lag which is likely to increase the risk of working memory overload (Gile 1995).
In a simple consecutive interpreting experiment, Gile (1991) found that student interpreters who took notes failed in their rendering of proper names (taken here as indicators because of their vulnerability to attentional deficit) more often than students who did not.

The tightrope hypothesis
The present study addresses the third hypothesis mentioned in section 1, one that is more holistically associated with the Effort Models, namely the idea that most of the time, total capacity consumption is close to the interpreter's total available capacity, so that any increase in processing capacity requirements and any instance of mismanagement of cognitive resources by the interpreter can bring about overload or local attentional deficit (in one of the Efforts) and consequent deterioration of the interpreter's output. This 'tightrope hypothesis' is crucial in explaining the high frequency of errors and omissions that can be observed in interpreting even when no particular technical or other difficulties can be identified in the source speech (Gile 1989): if interpreters worked well below saturation level, errors and omissions should occur only when significant difficulties came up in the source speech.
The precise aim of this investigation is to try to establish, in a sample of professionals interpreting a speech, whether there are indeed errors and omissions (e/o's) affecting segments that present no evident intrinsic difficulty. If there are, it is likely that they can be explained in terms of processing capacity deficits such as predicted by the EM.
The underlying rationale of this study is the following: One indication of the existence of such e/o's would be variability in the segments affected in the sample (at the level of words or propositions). If all subjects in the sample fail to reproduce adequately the same ideas or pieces of information, this would suggest the existence of an intrinsic 'interpreting difficulty' of the relevant segments (too spe-cialized, poorly pronounced, delivered too rapidly, too difficult to render in the target language, etc.), even if available descriptive tools are not sensitive enough to identify such difficulty beforehand. If however only a few subjects fail to render them correctly in the target language, this would tend to weaken this explanation and strengthen the hypothesis that processing capacity deficits are involved. An analysis of intersubject variability in the incorrectly-rendered speech segments in plain interpretation can therefore provide interesting evidence in this respect.
Another indication could come from an exercise in which each subject is asked to interpret the same speech twice in a row. Having become familiar with the source speech during their first interpretation, subjects can be expected to correct in their second version many e/o's committed in their first version. If, notwithstanding this general improvement of interpreting performance from the first to the second target-language version, it were possible to find new e/o's in the second version whereas the same speech segments were interpreted correctly the first time, this would be an even stronger indication that processing capacity deficits are involved. It is difficult to find another explanation: the fact that the segments affected in the second target-language version were interpreted adequately in the first suggests that the interpreters did understand them the first time and do possess the necessary linguistic knowledge and knowhow to reexpress them in the target language.

Method
The source speech was taken from a video recording of a press conference given by George Fisher when his position as Kodak's new Chief Executive Officer was announced. It was interpreted by myself from a video-cassette during a professional assignment, and I asked for and obtained permission to use an audio-tape version for teaching and research purposes (Kodak's cooperation is gratefully acknowledged). The 245 words extract used here (see appendix) is the full answer given by George Fisher to a question put to him by a journalist. It is 1 minute and 40 seconds long, is of a fairly general nature, requires no previous knowledge of the subject and only contains one specialized term, the word "silver halide". Subjects are professional conference interpreters who were recruited in the workplace, always during the first half of a simultaneous interpreting working day, and always after they had time to 'warm up' with one or two turns of interpreting in the booth within the framework of their professional assignment. They were told they would have to interpret from English into French the answer of Kodak's new CEO to a journalist's question during a press conference held when his appointment to this position was announced. They were also told how to translate "silver halide" into French ("halogénure d'argent"). The experiment was carried out in interpretation booths, with the source speech coming out of a portable cassette player over standard headphones and the target speech being recorded on a portable cassette recorder. When they finished interpreting, subjects were asked to start interpreting again. Ten subjects were recruited over three distinct interpretation assignments of the author's in Paris: all have either French as their A language (roughly their native tongue) and English as a strong B language (non-native, but strong enough to work into it from an A language), or are 'double A' bilinguals. All had regular working experience of 15 years or longer except one whose experience was 7 years, and all are members of AIIC, the International Association of Conference Interpreters, and work both in the private sector market and for international organizations, in particular OECD and UNESCO. They can therefore be considered qualified professionals.
Target speeches were transcribed, and transcripts were scanned for errors and omissions. This method is not without pitfalls, both because of high inter-rater variability in the perception of what is and what is not an error or omission and because what may be identified as an error or omission in a transcript may be an acceptable rendition in an oral presentation of the speech (as demonstrated in Gile 1999). To avoid these pitfalls, only instances of what appeared to me as flagrant errors or omissions were included in the analysis, and at least two further opinions from other conference interpreters were requested to confirm that the e/o's I identified were also considered e/o's by them. There were no dissenting opinions. Moreover, the likelihood that the e/o's identified by me were also considered e/o's by the subjects themselves is heightened by the fact that all of them but one ("well enough" -e/o n°4) were corrected in the second version of the target speech by at least some of the subjects (table 1).
This conservative operational definition of e/o's may have left out many other e/o's. I decided to accept this loss of sensitivity of the tool insofar as it preserved validity by reducing the probability of 'false positives' (mistaking text manipulations considered acceptable by the subjects for e/o's).
The analysis then proceeded by trying to determine: a. How many subjects in the sample made an e/o for each affected speech segment. b. What e/o's were corrected in the second version of the target speech. (However, due to a technical problem, no second version could be recorded for subject D, and a local technical problem made it impossible to check the second version for subject C as regards e/o n°16).

List of e/o's 1. "I'm sure my..."
Subject F: "je suis sûr que c'est possible" ("I am sure it is possible"). Type of e/o: error. Corrected in the subject's second version 2. "I don't even know these people yet": Subject D: "je ne connais même pas ces gens": ("I don't even know these people"). Omission of the idea expressed by "yet". No second version. Subject E: "que je connais bien" ("that I know well"). Error. Corrected partially in the second version ("je connais pas ces gens" -"I don't know these people") Subject F: "je connais" ("I know"). Error. Corrected in the second version Subject H: "je ne les connais pas encore" ("I don't know them yet"). Who is "they" ? Corrected in the second version. Subject J: "je ne connais pas ces gens" ("I don't know these people"). Omission of the "yet" idea. Uncorrected in the second version.

"Scientists and engineers":
Subject B: "le monde scientifique" ("the scientific world"). Corrected in the second version. Subject I: "les chercheurs" ("researchers"). Corrected in the second version.

"well enough"
Subject E: "que je connais bien" ("that I know well"). Uncorrected in the second version. Subject F: "je connais" ("I know"). Uncorrected in the second version. 5. "Since I don't know..." Subject A: "dont on ignore encore la nature..." ("the nature of which is not known yet"). Uncorrected in the second version. 10. "as far as capture goes" Subject A: omission. Corrected in the second version. Subject J: "marché captif" ("captive market"). Corrected in the second version.

New e/o's in the second version
The following is a list of e/o's found in the second version of the target speech (in the second line, marked as "b") whereas the relevant speech segments had been correctly interpreted in the first version (first line, marked as "a"): Subject A: I know scientists and engineers well enough a. "je connais suffisamment les scientifiques et les ingénieurs pour savoir qu'ils ne..." ("I know scientists and engineers well enough to know that...".) b. "mais je parle des scientifiques et des ingénieurs qui ne seraient pas très heureux..." ("but I am talking about scientists and engineers who would not be very happy..." -omission of the idea expressed in "well enough").
let's concentrate on that a. "... le côté imagerie chez Kodak concentrons-nous sur cet aspect..." ("The imaging side of Kodak let us concentrate on this aspect...") b. "... le domaine imagerie chez Kodak. Il faut reconnaître là que..." ("...the imaging side of Kodak. Here, we must acknowledge that..." -omission of invitation to focus in the second version) Subject B: I don't even know these people yet a. "je ne connais pas encore ces gens là" ("I don't know these people yet") b. "je ne les connais pas encore" ("I don't know them yet" -who is "them" ?) I know scientists and engineers well enough a. "mais je connais suffisamment bien..." ("but I know sufficiently well") b. "mais je connais bien les..." ("but I know well the..." -the idea expressed in "well enough" is missing) in perhaps ways that are totally different a. "de les diffuser de manière peut-être tout-à-fait différente..." ("to disseminate them in a way that may be totally different...") b. "de les distribuer d'une manière tout-à-fait différente" ("to distribute them in a totally different way" -the idea expressed in "perhaps" is missing) Subject C: (part of the recording is missing -results are taken from the existing part) I don't even know these people yet a. "je ne les connais pas encore" ("I don't know them yet") b. "je ne connais suffisamment bien" ("I do [the French equivalent of "not" is missing, but the sentence is negative nevertheless] know well enough" -missing reference to "people").

Subject F: No instance of new e/o's in the second version
Subject G: in perhaps ways that are totally different a. "peut-être d'une manière totalement différente" ("perhaps in a totally different way") b. "certainement être complètement différentes" ("certainly be completely different" -the idea expressed in "perhaps" is missing).

Subject H:
let's concentrate on that a. "on va se concentrer là-dessus" ("we are going to concentrate on that") b. omission let's concentrate on that a. "si vous voulez on parlera de ça pour l'instant" ("let us talk about that for a moment, if you will") b. omission they're really exciting a. "c'est tout-à-fait intéressant" ("it is quite exciting") b. omission  Moreover, among them are interpreters who only made a total of 5 e/o's or less each (subjects A, C, E, J) and who are therefore among the 'good performers' in the sample. It may therefore be conservatively assumed that at least for these e/o's, no intrinsic difficulty of the affected sourcespeech segments is involved (that no specialty-specific or languagespecific difficulty is involved will appear clearly to readers from the list of segments and e/o's given above). This strengthens the tightrope hypothesis as explained in section 2.

Quantitative analysis
In the repeat operation, the expected overall performance improvement in the second version of the target texts was confirmed: the number of e/o's decreased for 5 subjects (B,E, F,H, I), remained the same for 2 subjects (A and J), and increased (by one) for one subject (G). More interestingly for the tightrope hypothesis, 6 subjects out of the 9 (66,6%) for whom a second version was at least partially available for analysis (as mentioned earlier, only parts of the recording were available for subject C) made at least one new e/o in the second version of their target text whereas the relevant source-text segment had been better interpreted in the first version. This suggests that the phenomenon is not rare and further strengthens the tightrope hypothesis as explained in section 2.

Discussion and conclusion
As mentioned above, the high detection threshold for e/o definition used here in order to reduce to the largest possible extent the number of 'false positives' means that other phenomena that could have been used to measure cognitive load were not exploited. In particular, no attempt was made to look at borderline cases, at the deterioration of linguistic output quality, or at changes in the prosody or the quality of the interpreter's voice. If the low sensitivity of the tool had made it impossible to obtain convincing findings, more sensitive tools would have had to be designed, and reliability could have become a problem. Fortunately, the tool proved to be sufficient, illustrating the idea that at the beginning of an exploration, it is often possible and even desirable to use primitive tools rather than resort to finer and more fragile tools.
The findings of this study strengthen the case for the tightrope hypothesis and thus give some support to the Effort Models as a conceptual tool to explain the interpreters' cognitive-constraints-based limitations. However, the usefulness of the EM as an operational tool and prospects for further development depend on more precise quantitative measurement possibilities, in particular on on-line measurement of attentional resource consumption during interpreting. When more is known about how close to saturation interpreters work, how much additional capacity is taken up by triggers, and what the exact time-course of failure sequences is, both better testing and more powerful uses can be found for the Effort Models. Meanwhile, I can only agree that the indirect, mostly rather gross methods used so far cannot be said to have led to systematic testing or validation of the models (see Massaro and Shlesinger 1998:43). On the other hand, as illustrated in section 2 -also see Schjoldager 1996 andSabatini 1998 -they have inspired empirical studies focusing on interpreting (as opposed to cognitive models and theories which have inspired empirical studies focusing on cognitive issues). They have thus contributed to enhancing the knowledge base we have on the interpreting product, and may give some credibility to the idea that the usefulness of a concept or model in scientific exploration is not necessarily a function of its degree of sophistication.