Text-oriented research into interpreting - Examples from a case-study

Different methodological tools were applied in a case study on interpretation of read-out speeches: A comparison of the 3 source texts used showed considerable differences between the „objective“ text description using quantifiable parameters and the „subjec-tive“ evaluation of the same speeches by the 12 participating interpreters. Incorrect rendition or omission of proper names/numbers was reduced by the use of the manuscript in the booth, while overall omissions/errors were highly variable between subjects. In a process-oriented approach, the relationship between use of manuscript, timelag and long omissions was studied and resulted in a series of new questions to be investigated. Finally, lexical variability proved surprisingly high with only 6.6% of words being used by all interpreters. Strengths and weaknesses of the different methodological tools will be discussed and proposals made for further research in the field.


Introduction
For decades, source and target texts have been the centerpiece of most empirical interpretation research. Considering the fact that interpreting is a complex cognitive activity which requires equally complex scientific tools in research (Kurz 1996: 72ff), the use of the tangible outcome of the interpretation process, the target text, is an obvious choice for research purposes.
The way in which texts are used to gain insights into interpretation have changed over the years. Whereas early works on interpreting relied heavily on introspection and/or the descriptive, non quantified analysis of individual source and target texts by the authors of these studies (e.g. Paneth 1957, Seleskovitch 1978, Lederer 1980, it is now widely recognized that the empirical value of such approaches is often limited to finding hypotheses to be tested in further studies (Gile 1995a: 50ff, Kalina 1998).

Comparative analysis of texts in interpretation research
A more systematic comparison of different source texts in interpretation (ST), different interpretations (I) of a single ST and the relationship between ST and I offer potential insight into a great variety of questions:

Content-oriented research
The comparison of various interpretations of the same ST as to completeness and accuracy has been central to interpretation research since Barik's much debated error typology (Barik 1971). The potential didactic usefulness of comparing different source texts, e.g. concerning the frequency of topics dealt with in specific interpretation settings, has so far not encouraged much work in the field (e.g. Kopczynski 1980: 16f).

Process-oriented research
Both ST and I, and especially the relationship between them, can be used as "process-markers" in the interpretation process: Different interpretation strategies such as anticipation, condensation, deverbalization or timelag can be highlighted through a systematic comparison of input and output.
Although much has been said about the methodological weaknesses of product-based inferences on the interpretation process (e.g. Kalina 1998:127ff), the potential usefulness of this approach for the formulation and testing of hypotheses seems to be generally accepted.

Form-oriented research
Increasing attention has been paid recently to a form-oriented comparison of source text (ST) and target text (TT) in interpreting. Whereas parameters of delivery, such as speed, pauses or intonation, gained the interest of early researchers in the community (e.g. Oléron & Nanpon 1965) and keep playing a prominent role in the quality discussion in interpreting studies (Collados Aís 1998), systematic studies into text cohesion in both ST and TT, density of information, interpretation-spe-cific terminology and lexicometric features are a rather recent phenomenon (e.g. Gallina 1992, Lamberger-Felber 1999. The importance of form-oriented studies is best illustrated by the shift of paradigms it created: As a consequence of the comparison of interpretation implying different language pairs (e.g. French and Japanese, Gile 1992), language specificity of the interpretation process has gained wider acceptance than the deverbalisation paradigm that influenced interpretation curricula all over the world for several decades (Seleskovitch 1978).

Case study on simultaneous interpreting of read-out speeches
In a case study of simultaneous interpretation (SI) on read-out speeches, the three above-mentioned research approaches were used to test several hypotheses on the influence of the interpreters' use of the speaker's manuscript on interpreting performance (detailed description of the research project see Lamberger-Felber 1998: 44ff). In the following, advantages and shortcomings of the different methods used will be discussed based on the results obtained in the different stages of the study.

Experimental design
12 Austrian conference interpreters with at least ten years of professional experience were asked to interpret three read-out speeches of 8-10 minutes length from English into their A-language German. The interpreters were divided into 3 groups: A, B and C. Each group had to interpret one speech with the manuscript given to them with enough time to prepare it for use in the booth (TV), one speech with the manuscript available for use in the booth but without preparation time (T), and the third speech without ever seeing the speaker's text (O). The three speeches were taken from a conference on small and medium sized enterprises held in Vienna in 1991 and thoroughly documented, recorded and transcribed by Pöchhacker (Pöchhacker 1992). In order to provide a realistic framework for the experiment, interpreters recieved exactly the same documentation and information as their colleagues who had worked in the actual conference.

Results
Performance of the interpreters under the different working conditions was compared using different evaluation methods (Lamberger-Felber 1998: 52ff).

Method
In order for the experimental setup to yield the desired results, the three speeches used had to be comparable from an interpreter's point of view. The only available model offering well defined parameters for text description in IR was Pöchhacker's Text(darbietungs)profil (Pöchhacker 1994:112). Designed as a complementary tool for systematic documentation of both source and target texts in interpretation, the model contains a series of parameters of presentation such as intended auditory (ADR), degree of premeditation of the speech (VOR), use of electronic media (MED), speed of delivery (TEM), melodiousness (MEL), dynamics (DYN) and rhythm (RHY) of presentation, voice quality (STI) and clarity of diction (ART) to be graded on a five-point scale (Pöchhacker 1994:110ff).  (Pöchhacker 1994:112, Abb.9) In Pöchhacker's text-profile, the three speeches chosen for the experiment showed remarkable similarities with only slight variability in the speed of delivery and melodiousness and dynamics of presentation: Figure 3: Text-profiles of the three speeches SI, HI, SL (Pöchhacker 1992, Band II: Transkriptionen, without page number) Based on these results, and considering the fact that the topic of all speeches and their degree of technicality was similar, the assumption was made that the speeches could be considered comparable for the purpose of the study. In order to check this comparability against the subjective impression of the participating interpreters, the latter were asked to fill in a questionnaire after interpreting each speech. The questionnaires contained both questions referring to parameters as indicated in Pöchhackers textprofile and questions about the overall difficulty of the speeches interpreted.

Results
All 12 interpreters considered speech SI to be the easiest of the 3 speeches interpreted, and 75 per cent agreed on speech SL being the most difficult. This ranking is confirmed by the addition of the average difficulty score given for the different parameters (speed, terminology, clarity of diction-accent and overall difficulty) and seems to indicate an overall agreement by the interpreters as to the non-comparability of the speeches.
All the more surprising is the high intersubject variability in the evaluation of individual parameters such as speed, terminology and clearness of articulation-accent: With the exception of speech SI which received similar scores by all interpreters, the scores given for speeches HI and SL vary from 1 (easy) to 4 (difficult) for the parameters "overall difficulty", "terminology" and "accent" and from 2 to 4 for "speed". Even without the generally high scores given by interpreter A4 and the low scores given by C1, this accounts for a high variability in the perception of the different parameters of the speech.

Conclusion
Where Pöchhackers objective parameters of text-description indicate a high comparability of source texts, the 12 interpreters participating in the experiment find considerable differences when interpreting these texts. A possible conclusion would be that the used text profile lacks parameters relevant for this type of interpretation research: apart from speech presentation, content (degree of technicality, terminology, frequency of problem-triggers) and form (stilistic level, semantic density, idiomatics, sentence length, complexity of grammar, layout of manuscript etc.) of source texts in interpreting are likely to be relevant if relative comparability between texts is to be assured.
The high scoring variability in almost all individual parameters investigated indicates that any objective textprofile will appear unsatisfactory when checked against the subjective evaluation of source texts by the interpreters themselves. An interpreter's personal history (knowledge of regional accents, technical knowledge, language preferences, stylistic preferences, etc.) is likely to have more influence on his/her perception of different source texts and their difficulty for interpreters than objectively quantifiable parameters would. Therefore, whenever a comparison between interpretations of different source texts is made, both objective and subjective comparability of source texts will have to be taken into account by using more extensive text-profiles and checking them against the opinions of participating interpreters (Lamberger-Felber 1997:231ff).

Method
In order to compare the influence of the use of the speaker's manuscript by the interpreter on interpreting performance, a content-oriented comparison of interpretations was performed using the following parameters: • names and numbers • semantic deviations (errors and omissions) Interpretations of the 3 speeches by all 12 interpreters (36 interpretations of 8-10 mins duration each) were transcribed and compared (for the exact methodology see Lamberger-Felber 1998:72ff).

Names and numbers
For the purpose of this study, proper names and numbers were treated equally without regard to their complexity (e.g. 3 versus 160 million, Nashville vs UN-ECE, etc.). Only complete renderings were considered correct, approximations therefore counted as errors.

Results
The percentage of correct proper names and numbers was found to be highest for the speech SI (91.7 per cent correct), followed by speech HI (88.9 per cent correct) and SL (67.5 per cent correct). This ranking confirms the order of difficulty established by the interpreters (see 2.2.1.2). A comparison of the performances with text (conditions T + TV) and without text (O) showed a lower average error rate for names and numbers when the speaker's manuscript was used by the interpreters: Interpreters using a prepared manuscript (TV) interpreted 97.6 per cent of all names/numbers correctly, interpreters working under condition T (manuscript, but no preparation time) 91.7 per cent, compared to 85.7 per cent for interpreters working without text (O). The use of the manuscript reduced the average error rate for names/numbers by 67.3 per cent (speech SI), 53.8 per cent (speech HI) and 67.8 per cent (speech SL), respectively.

Figure 5: Names/numbers incorrect per interpreter
Intersubject variability is high under all conditions: Whereas interpreter C4 failed to interpret 5.1 per cent of names/numbers of all 3 speeches correctly, the percentage for A1 is as high as 32.2 per cent.
At the same time, the particular names/numbers affected by interpretation errors vary for the different interpreters: Only 15 per cent of all names/numbers in speech SL, 33.3 per cent in speech HI and 52.4 per cent in speech SI were interpreted correctly by all interpreters, whereas not a single name/number was interpreted incorrectly by more than 65 per cent of interpreters.

Discussion
The generally accepted hypothesis that interpreting performance for names and numbers is better if interpreters have the speaker's manuscript at their disposal is confirmed. At the same time, high intersubject variability concerning both the number of deviations and the names/ numbers affected by errors indicate that a more in-depth study on a larger corpus would be necessary if more general conclusions were to be drawn. Questions to be dealt with would include different types of names/numbers, different interpreting strategies in dealing with these, language pairs and interpretation direction, etc. (e.g. Braun & Clarici 1996).

Semantic deviations
For the purpose of the study, the 12 subjects' interpretations of the 3 speeches were evaluated by the author. Each instance of semantic deviation was qualified as error (E) or omission (O). In order to account for the length of the segment affected, E and O were weighted: E/O of more than 3 words of the original speech were counted double. A more sophisticated evaluation method was considered unnecessary at this stage (Lamberger-Felber 1998:86ff).

Results
The average number of deviations varied considerably between the 3 speeches: Interpretations of speech SL contained an average 8.94, of speech HI 6.06 and of speech SI 3.07 deviations per 100 words of the original.
At the same time, intersubject variability was very high within the groups A, B and C working under the same conditions (TV, T, O).

Figure 6: Errors/omissions in the subjects' interpretation of 3 speeches
Due to high intersubject variability within groups of only 4 subjects, the calculation of the average number of deviations for each speech under a given working condition (TV, T, O) was considered of little interest.

Discussion
The method chosen for this case study failed to produce results that either confirm or disprove the hypothesis that semantic deviations in interpretations of read-out speeches are more frequent when interpreters do not have the speaker's manuscript.
The reason for this lies in the unexpectedly high variability between the 12 subjects: Since the group was considered relatively homogeneous when judged by objective parameters such as training, professional experience, membership in a professional association with quality-oriented admission rules (AIIC) and presence in the marketplace, this came as a surprise and questions the whole experimental setup: Whereas a sample of 12 interpreters is rather big compared to many empirical studies carried out on SI so far, it may be too small if a subdivision of the sample is necessary for the purpose of the study, e.g. for comparing performance under different conditions. The easiest method, the direct comparison of the performance of one subject under different conditions by changing only one variable (e.g. manuscript available or not available) is incompatible with the nature of interpreting: Original speeches are hardly ever interpreted twice by any given interpreter, and knowledge about the text is certain to increase with each turn and influence the interpreter's performance.
Any methodological setup requiring more than one original has to consider the comparability of source texts which has proved to be very difficult to determine (see 2.2.1.).
The high variability of performance by the subjects participating in the study suggests that the influence of specific variables (e.g. working conditions) on interpreting can be determined only by either studying a sample big enough to reduce the impact of intersubject variability, or by trying to establish an "individual level of performance" of each interpreter as compared to the average performance of a group. An attempt at the latter was made in this case study: Interpreters were "positioned" within the group by comparing the number of deviations to the average number of deviations for each speech: Individual performances under the different conditions TV, T and O compared to the group average showed that 9 out of 11 subjects (82 per cent) performed better with a prepared manuscript than without a text, 9 out of 12 (75 per cent) produced fewer errors/omissions with an unprepared manuscript than without a text, and for 9 out of 11 (82 per cent) the preparation of the manuscript helped reduce the number of deviations (Lamberger-Felber 1998:93ff).
Whereas these results seem to confirm the hypothesis that availability of the speaker's text increases semantic completeness and correctness of interpretations, they are based on the assumption that intersubject variability can be accounted for by comparing individual performance to group performance. Without further proof on the "stability" of individual interpreting performance (e.g. continuity of strategic choices made by one interpreter, but not by the majority of the group, etc.), the method remains questionable. Sample size and variability in interpreter perform-ance are probably two of the greatest methodological challenges empirical interpretation research will have to face in the near future.

Process-oriented research: timelag and long omissions
Several authors have stressed the additional effort interpreters have to make when interpreting a read-out speech with the help of the speaker's text: Permanent coordination between oral and written input requires additional attention (e.g. Thiery 1981: 121f, Weber 1990: 48, Gile 1995a: 184, Pöchhacker 1997. At the same time, the hypothesis is made that interpreters working with text suffer less from memory restrictions and thus tend to keep a longer timelag which may eventually result in the omission of longer passages of the original (Gile 1995a:111f) . In the present study, the comparison between ST and TT was used as a marker for timelag strategies.

Method
As a first step, interpretations were scanned for long omissions (more than 15 words of the original speech). In a second stage, the timelags of all interpreters were measured at all points where at least one interpreter omitted a long passage of the original (details see Lamberger-Felber 1998:116ff).

Figure 8: Long omissions and working condition
Long omissions ocurred in the interpretations of all three speeches. The average number of long omissions was highest for speech SL (2.92 omissions), followed by HI (0.5) and SI (0.33). The working conditions do not seem to have an influence on the average number of long omissions ( fig.8).
At the same time, intersubject variability is again very high: Interpreters A3 and C4 never have recourse to long omissions, whereas B1 and C1 show at least one long omission in each interpretation ( fig. 9). The timelag of all interpreters was measured at 21 instances in the text where at least one interpreter omitted more than 15 words of the original. Results showed that for only 3 out of 12 interpreters (25 per cent) the use of the speaker's manuscript did not result in a longer average timelag. Figure 10: Average timelag per interpreter with and without text 75 per cent of long omissions are preceded by a timelag longer than the average measured for that particular instance in the text. In 75 per cent of cases, the longest timelag measured for any one of the measuring points is followed by an omission of more than 15 words, showing thus a positive correlation between length of timelag and number of long omissions (p=0.01).

Discussion
Results seem to indicate a) that the average timelag is longer for SI with text as compared to SI without text and b) a timelag longer than average indicates a risk of omitting a long passage of the original. At the same time, the use of the speaker's text in the booth did not reduce the average number of long omissions registered. The reasons for this apparent contradiction are likely to be found in the methodology used: First, the limited number of interpreters per group (4) increases the influence of individual performances, which were found to be highly variable. Secondly, the arbitrary definition of "long" omissions as omissions of more than 15 words of the original has an obvious impact on the whole chain of argumentation. Had the limit been 14 words, more long omissions might have been registered, more measuring points established, and as a consequence the average timelag of each interpreter might have been different, etc. The results obtained in this study can thus only serve as a starting point for further analysis using different threshholds and a more complete observation of timelag strategies, including random timelag measurements throughout the text.
Moreover, high variability between interpreters' timelags indicate a further field of interest. The absolute value of the timelags measured is limited due to the fact that measuring took place only at specific points, and no statement can be made as to the -conscious or subconsciousstrategic decisions made by the interpreters that led to the measured timelag. However, the fact that average timelags vary between 2.2 and 12.2 secs gives reason to believe that omissions may not be the only measurable impact of different timelag strategies. Small-scale studies like this one can thus serve as "hypothesis-finding-missions" for further research.

Form-oriented research: lexical variability in SI
Although much has been said about interpreting as a highly specialized and creative activity which merits protection by copyright law, little has been done so far to strengthen this argument with empirical data about variability of the interpretation product (e.g. Strolz 1992:177ff).
Furthermore, the importance of terminology for interpreting becomes only too obvious in interpretation classes, and the possibility of providing interpretation students with terminology relevant for interpreting certainly offers an interesting perspective (Gile 1995b: 224).

Method
The 3 interpretations of all 12 subjects were transcribed and converted electronically into word-lists. Words were counted regardless of their grammatical form (e.g. was, would be, will be = is) or number (problems = problem) and listed according to their frequency (Lamberger-Felber 1998:138ff). The lists thus obtained were used to investigate i.a. the following: a) length of interpretations in words b) lexical variability between interpreters c) degree of individuality of word use d) frequently used words

Results
The length of the interpretations varied considerably between the subjects, with the greatest variations for the speech subjectively considered most difficult (speech SL, from 714 to 1120 words), followed by speech HI (624 to 883 words) and speech SI (676 to 855 words). In the case of speech SL, this means that the longest interpretation is almost 40 per cent longer than the shortest interpretation.
Results also showed a certain tendency toward longer or shorter interpretations for each of the subjects: Whereas A3 used more words than the speaker in all 3 interpretations, B1 or C1 used at least 10 per cent fewer words than the speakers (Lamberger-Felber 1998:142ff).
In total, the 12 interpreters used 2284 different words in their interpretations of the 3 speeches. Whereas 45.6 per cent of all words were used by only 1 interpreter, the percentage of words used by all interpreters is as low as 6.6 per cent ( fig.11).

Figure 12: Lexical variability between interpreters
The percentage of words used by only one interpreter is again highest for the speech considered most difficult (51.52 per cent, speech SL), followed by speech HI (43.11 per cent) and SI (40.85 per cent).
The individuality of word use also varies considerably between interpreters: Whereas only 9.85 per cent of the vocabulary of A2 are individual words not used by any of the other subjects, the percentage of individual words is as high as 15.86 per cent for B1 ( fig.13). The list of words used by all interpreters (150 out of 2284), apart from the expected function-words such as articles, pronouns, conjunctions, prepositions and auxiliary verbs, also includes a series of keywords for the topic (see Annex under 5).

Discussion
The use of very simple statistical tools for the comparison of vocabulary used by the 12 subjects interpreting the same 3 speeches proved a very interesting means to open new research perspectives in interpreting studies. The fact that the length of interpretations of one and the same speech by similarly qualified interpreters can vary by almost 40 per cent leads to a whole series of questions that could be the subject of further investigation: Although in the present study there is a significant correlation between the length of interpretation and the number of semantic omissions from the original (p=0.02; Lamberger-Felber 1998:164ff), this does not account for variations in length of up to 40 per cent. An investigation of different styles of interpretation as to the preferential use of certain structures, words or strategies (e.g. compression vs. explicitation) seems an interesting path to explore.
The same is true for the unexpectedly high intersubject variability in word use and the very limited number of words used by all interpreters. A similar study involving a larger corpus of interpretations in different languages could be used as a strong argument in favour of the creative element in SI. It could also offer insights into the relevance of certain terms and structures for different topics and conference types, which could prove very useful in interpreter training.
Moreover, the frequency of certain word categories (e.g. conjunctions, verbs) or structures could be related to the cohesion of the target text and/or to quality evaluations by listeners.
The different degrees of individuality in word use observed within the group of subjects also opens up new research perspectives: Do highly individual interpreters dedicate an extra effort to finding the right word, and if so, does this have an impact on other features of the interpretation (e.g. completeness, timelag, redundancy etc.)? Are lexically rich interpretations considered better by the listeners?
Although very limited in scope (only non-specific word counts), the present study produced a whole series of specific questions to pursue in interpreting studies. A more sophisticated method of lexical statistics including different categories of words and structures would certainly offer new and interesting research perspectives (Lamberger-Felber 1999:189ff).

Conclusion
The present study shows a series of text-oriented research methods into SI, while the more traditional content-oriented approaches like comparison of errors and omissions in interpretations of 12 interpreters working under different conditions proved at times methodologically unsatisfactory due to the small number of subjects. The unexpectedly high intersubject variability within a homogeneous group is an interesting result in itself which certainly merits further investigation.
Very promising results could be obtained in fields which so far have raised little interest in the empirical IR community: The comparison of objective text presentation parameters and subjective evaluation of source texts by interpreters showed the need for further studies into the nature of source texts and its relevance for interpreting performance. This is particularly important if experimental setups require the use of different speeches for the investigation of dependent variables.
A wealth of new research paths was produced by a rather simple lexicostatistical evaluation methodology. The development of software adapted specifically to IR needs for the statistical processing of large corpora of both source and target texts could certainly offer new perspectives for text-oriented research into interpreting.