Social Interaction. Video-Based Studies of Human Sociality.

2022 Vol. 5, Issue 2

ISBN: 2446-3620

DOI: 10.7146/si.v5i2.124481

Social Interaction

Video-Based Studies of Human Sociality

Multimodal Try-marking for Securing Recipient Understanding of Codeswitched Lexical Items in Everyday ELF Conversations

Ivana Leinonen

University of Oulu


In this paper I analyse sequences where multilingual English as lingua franca (ELF) speakers codeswitch single lexical items from their native language in an attempt to resolve a word search or when displaying hesitancy. The main focus of the paper is on try-marking used by the speakers as a technique for securing recipient understanding of the codeswitched items. The analysis shows that try-marking does not only comprise rising intonation but also specific embodied resources that highlight and prolong the relevance of recipient response beyond prosodic marking. The analysis also presents a single case in which the speaker invites recipient confirmation of understanding without using rising intonation but by relying mainly on the end positioning of the target word near the transition relevance place in combination with response-mobilising embodied cues. The results add to our knowledge of how participants pre-empt trouble and achieve mutual understanding in linguistically diverse ELF settings.

Keywords: multimodality, conversation analysis, multilingual interaction, video data, English as lingua franca

1. Introduction

In situations where participants rely on lingua franca as their ‘contact’ language (Firth, 1996), it is not uncommon for participants to also share other languages which they can utilise to achieve various interactional goals (Cogo, 2010; Hülmbauer & Seidlhofer, 2013). When one or more parties, however, has only partial knowledge of the switched-to language, participants may need to adopt specific means to avoid possible trouble that could be caused by their language choice (Klötzl, 2014; Pietikäinen, 2014). In this study, I explore how speakers orient towards securing the other party’s understanding of a codeswitched lexical item by means of try-marking.

Try-marking has originally been described by Sacks and Schegloff (1979, p. 19) as a speaker technique for securing recipient recognition (or understanding) of a reference by producing it with a rising final intonation followed by a brief pause that creates a place for recipient response. In second language interactions, try-marking has been regarded as an implicit marker of a word search that can serve various functions depending on the context and sequential position in which it is employed (Duran et al., 2019). Previous studies show, for instance, that try-marking can be used to check recipient understanding of the reference, to invite confirmation of the accuracy of the candidate solution (Koshik & Seo, 2012), as well as to elicit the sought-for lexical item from a more experienced speaker (e.g., Eskildsen, 2018; Kotani, 2017; Kurhila, 2006; Pekarek Doehler & Berger, 2019).

While the majority of studies have observed the use of try-marking in L2 talk between native and non-native speakers of a certain language, few studies have shown how try-marking is used in lingua franca interactions where all participants are non-native speakers of the main language (e.g., Matsumoto & Canagarajah, 2020). In this study, I examine try-marking in video-recorded everyday conversations among participants who rely mainly on English as lingua franca (ELF) but also occasionally mobilise other languages from their multilingual repertoires. I am interested in sequences where participants codeswitch to their native language (that is partially shared by the co-participants) in an attempt to resolve a word search or when displaying hesitancy. How do participants use try-marking to manage participation and recipiency in these interactional moments?

In this paper I address this question by considering multimodal—that is, both verbal and embodied (gaze, facial displays, and gestures)—features of try-marking. The analysis especially highlights the importance of embodied resources in try-marking by demonstrating how they can be used by speakers to accentuate and prolong the relevance of recipient response—confirmation of understanding—beyond prosodic marking. Moreover, I will argue that speakers can also invite recipient confirmation without marking the codeswitched item with rising intonation, by relying mainly on the positioning of the target word near a transition relevance place (i.e., end positioning; see Auer, 1984) in combination with response-mobilizing embodied cues (see Section 4.2).

The analysed instances show how participants display their orientation towards the progressivity of interaction by using codeswitching as a multilingual resource for moving on with their actions, while at the same time making recognisable their orientation to intersubjectivity with the use of try-marking for inviting recipiency and pre-empting trouble of understanding. In this context, try-marking makes an implicitly recognisable speaker effort to secure understanding while causing minimal disruption to the progress of talk (Svennevig, 2010). Especially in comparison to direct verbal inquiries about recipient knowledge such as ‘do you know the word X’ (that are found to be less frequent in the data), try-marking provides a subtle and economical way for speakers to index specific codeswitched lexical items as prone to being problematic for selected co-participant(s), thus making their orientation towards the category of ‘linguistic competence’ locally relevant.

2. Data and method

The data set consists of 12 video recordings of naturally occurring face-to-face interactions among friends. The total amount of video data is 22 hours, and it includes both dyadic and multi-party conversations. The video recordings were filmed with the informed consent of the participants whose names were pseudonymised in the transcripts. As the embodied conduct of the participants is central to the analysis presented in this paper, I am grateful for the participants’ permission to let me include the video clips in their original form.

The participants are two Czech and two Slovak immigrants in Finland as well as three native Finns. Even though all participants live permanently in Finland, not all of them can use Finnish as a main medium of communication; therefore, English is used by the participants as lingua franca. All participants, however, occasionally also utilise other languages from their multilingual repertoires: Czech, Slovak, but mostly Finnish—the language that is partially shared by all participants. Most of the codeswitches to Finnish in the data concern single lexical items, specifically content words.

For purpose of this paper, I have looked for all instances where codeswitched lexical items from the speaker’s native language were produced in such a format that made the recipient confirmation of understanding a relevant next action. Out of a total 29 cases, there were only five instances in which speakers used explicit verbal inquiries about recipient knowledge—such as ‘(do) you know (the word) X?’—to check whether the recipient is familiar with the codeswitched word. The focus of this paper is on the remaining 24 cases in which recipient confirmation is pursued by the speaker implicitly: with multimodal try-marking.

The instances in the collection comprise Finnish words produced by two Finnish participants exclusively; while the Czech and Slovak participants codeswitch to their native languages as well, they direct these codeswitches almost exclusively to one another. In these cases, the speakers were not observed to try-mark the codeswitched items, which can be explained by the fact that Czech and Slovak are closely related languages, and most Czechs and Slovaks can understand each other’s native languages without any substantial problems. The Finnish participants in the data, on the other hand, have either no or very limited knowledge of Czech and/or Slovak, which may be the reason why the Czech and Slovak participants, in general, refrain from codeswitching to their native languages when addressing the Finnish co-participants.

The selected examples come from four video recordings that were filmed at home of Martin and Tereza (a Czech and Slovak participant respectively) who are meeting their Finnish friends Aku and Jenni. In the analysed instances, Aku and Jenni use single words from Finnish (their L1) when addressing Martin and/or Tereza, for whom Finnish is a second language. While Tereza is a rather competent speaker of Finnish who uses the language on an almost daily basis (for example at work), her partner Martin—who is present in all analysed instances—has only a basic knowledge of Finnish and relies mainly on English in his daily interactions with Finns.

The methodological framework adopted for this study is one of multimodal conversation analysis (Mondada, 2013). As previous CA research has shown, the smooth coordination of turns at talk depends on the constant mobilisation and reciprocal interpretation of a wide range of multimodal resources, such as lexis, syntax, prosody, gaze, gestures, and body movements employed by the participants to produce both intelligible and accountable actions (Deppermann, 2013; Mondada, 2014a). The examples were transcribed using the conventions developed by Gail Jefferson (2004) for speech conduct and the conventions developed by Lorenza Mondada (2014a; 2018) for the transcription of visuo-spatial modalities.

3. Try-marking, word search, and codeswitching

The main focus of this study is on try-marking used by speakers as a means to secure recipient understanding of a codeswitched lexical item. Try-marking has originally been described by Sack and Schegloff (1979) as a technique used by speakers to elicit recipient recognition of a person reference. By producing the reference in a try-marked format—i.e., with rising final intonation and followed by a brief pause creating a space for the recipient to respond—speakers momentarily disrupt the progressivity of their ongoing turn in order to secure whether the recipient recognises the referred-to person (Sacks & Schegloff, 1979, pp. 18–19). Following recipient confirmation, speakers typically resume their turn at talk, while the lack of confirmation can prolong the suspension of speaker turn-in-progress as speakers tend to expand the sequence (e.g., with descriptions or clarifications) to offer additional information (Heritage, 2007; Sacks & Schegloff, 1979; Schegloff, 1996; Svennevig, 2010). Try-marking, however, is not restricted to person references but can be used with any reference that needs to be secured. For instance, Kitzinger and Mandelbaum (2013) show that with try-marking, speakers can invite recipient confirmation of the understanding of words that concern a specific area of expertise.

In research dealing with second language (L2) interactions in particular, try-marking has been regarded as an implicit marker of a word search (e.g., Brouwer, 2003; Eskildsen, 2011, 2018; Kurhila, 2006). Previous CA research on word search sequences in various settings shows that speaker lexical (or grammatical) difficulty is signalled to the recipient in both verbal and nonverbal ways (e.g., Goodwin & Goodwin, 1986; Koshik & Seo, 2012; Kurhila, 2006; Schegloff et al., 1977). The speaker’s ongoing turn typically gets interrupted with non-lexical perturbations such as sound stretches, cut-offs, hesitation markers (uhs, ehs), and pauses (Schegloff et al., 1977). Speakers can also make their trouble explicitly known to the recipient with the use of self-directed or other-directed remarks and questions such as ‘I don’t know what it is’ (see Section 4.2, Extract 2) or ‘how do you say?’ (Brouwer, 2003; Kurhila, 2006). The searching activity is often also displayed visually, with the speaker mobilising a so-called ‘thinking face’ (Goodwin & Goodwin, 1986). Moreover, the orientation of the speaker’s gaze can signal to the recipient whether the speaker is engaged in solitary search (averted gaze) or whether recipient co-participation is invited (gaze on the recipient) (Goodwin & Goodwin, 1986).

In a word search sequence, try-marking can have various functions depending on the context and sequential position in which it is employed (Duran et al., 2019). Several studies focusing on ‘learning in the wild’ situations note that L2 speakers recurrently use try-marking to elicit the sought-for lexical item from a more experienced speaker (e.g., Eskildsen, 2018; Kotani, 2017; Pekarek Doehler & Berger, 2019). In a language learning context, Koshik and Seo (2012) observe that in addition to eliciting a claim of understanding, learners can also use try-marking to pursue information about the accuracy of a particular content of talk. In addition to rising intonation, Koshik and Seo (2012) note that learners can also display their uncertainty visually with frowning.

Try-marking can also co-occur together with codeswitching, which can serve as an additional resource used by multilingual speakers in word search situations (e.g., Duran et al., 2019; Greer, 2013; Kurhila, 2006). For instance, in data concerning conversations among native (NSs) and nonnative speakers (NNSs) of Finnish, Kurhila (2006) observed that NNSs inserted prosodically marked lexical items from their L1 or other additional language as a means to resolve lexical difficulties. Kurhila (2006, p. 111) shows that NNSs ‘exhibited hesitancy’ in the initiation phase of the search and shifted their previously averted gaze back to the recipient as they uttered the codeswitched lexical item, which has been recurrently treated by the NSs as an invitation to participate in the search, e.g., by commenting on the word or translating it.

In research focusing on English as lingua franca conversations, codeswitching has been observed to fulfil such functions as filling a vocabulary gap, specifying an addressee, or signalling cultural background (e.g., Cogo, 2009; Kalocsai, 2013; Klimpfinger, 2010; Mauranen, 2013). Regarding how codeswitching is employed by ELF users, previous studies have mostly adopted Poplack’s (1987, 2004) distinction between flagged (or signalled) and unflagged (or smooth) codeswitching (e.g., Cogo, 2009; Hynninen et al., 2017; Kalocsai, 2013). According to Poplack (2004, p. 593), contrary to smooth codeswitches, ‘flagged switches are marked at the discourse level by repetition, metalinguistic commentary, and other means of drawing attention to the switch’. By focusing in more detail on flagging in an academic ELF context, Hynninen et al. (2017) note that the amount of lexico-syntactic flagging around the codeswitched element displays the speakers’ orientation to the acceptability/intelligibility of the switch for the recipient. However, the observations made by Hynninen et al. (2017) are based on transcripts of corpus data that do not include the participants’ visual conduct or intonation, and therefore the instances of prosodic try-marking as well as the role of co-occurring embodied resources (e.g., gaze and gestures) were not taken into consideration.

By drawing on video recordings of face-to-face ELF conversations and using multimodal conversation analysis, it is my aim is to investigate how participants in my data index codeswitched lexical items as being prone to problematic for selected co-participant(s) through an assemblage of verbal and embodied resources. While previous conversation-analytic research has explored in detail the relevance of such resources as gaze and gestures for the organisation of turn-taking and repair in monolingual interactions, little is known about how these semiotic resources contribute to participants’ co-construction of understanding in such linguistically diverse contexts as those of English as lingua franca (Matsumoto & Canagarajah, 2020). The microanalytic investigation presented in this study contributes to our knowledge about the workings of ELF interactions by illustrating how participants use a well-known technique of try-marking as a means to pre-empt trouble in moments when shared linguistic knowledge cannot be taken for granted.

4. Analysis

The analytic part of the paper includes four examples that are representative of the collection. Extracts 1–3 (Section 4.1) illustrate the use of multimodal try-marking in word search sequences; in these cases speakers use both rising intonation and embodied resources (gaze, ‘raised eyebrows’ facial display, and gestural hold) to mark the codeswitched word as provisional solution and to invite recipient confirmation of understanding. Extract 4 (Section 4.2) shows a single case in which the speaker uses try-marking to simply invite recipient confirmation of understanding (i.e., not in a word search). In this case the speaker does not use rising intonation but relies mainly on embodied resources and the end positioning of the codeswitched lexical item.

4.1 Gaze, raised eyebrows, gestural hold, and rising intonation as multimodal features of try-marking

The first example depicts part of a discussion between Jenni and Martin. A 2-year-old child and Martin’s partner, Tereza, are also present. Martin has previously stated that his colleagues from the university he formerly attended are the best in their field. He believes this is because the university’s students had to do a lot of studying on their own and ‘use their own heads’ to figure out the learning materials (lines 1–5). Jenni, however, makes a different point by stating that kontaktiopetus (‘contact teaching’) is the best type of teaching because students can ‘have a dialogue with the teacher’ (lines 6–12).

Extract 1. Kontaktiopetus (‘contact teaching’)

  Open in a separate window

Before Jenni produces the Finnish compound noun kontaktiopetus, her turn is interrupted with non-lexical perturbations (cut-offs, pauses, and hesitation marker, line 6) indicating that Jenni is engaged in a word search (Schegloff et al., 1977). The aspirated pronunciation of the plosive consonant [k] at the beginning of the cut-off ‘con-’ is not typical of Finnish (Suomi et al., 2008) and suggests that Jenni might be pursuing an English word. The fact that Jenni’s gaze is directed away from Martin (at the toy in her hands, line 6, Figure 1) indicates that at this point her search is a solitary one (Goodwin & Goodwin, 1986). In line 7, Jenni attempts to resolve her lexical trouble by using a word from her native language, Finnish. The final rising intonation marks the codeswitched word as a trial that needs to be confirmed by the recipient before it can be established as a valid reference.

The analysis of the speaker’s multimodal conduct, however, shows that rising intonation is not the only resource used by the speaker to invite recipient confirmation. As Jenni utters the Finnish word, she also moves her gaze towards Martin, thus selecting him as a next speaker (Lerner, 2003), and employs a facial display with raised eyebrows (line 7, Figure 2) that is held throughout the ensuing response relevance place (line 8) and until the recipient responds. Martin’s confirmation is slightly delayed; actually, Jenni has already resumed her TCU (line 9) as Martin produces the confirmation token in overlap (‘mm hm’, line 10). Unfortunately, as Martin’s face is not visible on camera, we cannot take into consideration any possible effect of his gaze and/or facial expression on Jenni’s turn resumption. However, the fact that Jenni does not repair the Finnish word right away could indicate that she does not treat Martin’s lack of immediate confirmation as problematic at this point. One reason for this may be that Jenni’s turn is designed to give a reason for why she finds kontaktiopetus to be the best (see line 9 and 12); therefore, the nature of the Finnish word gets (to a certain extent) clarified in the subsequent talk.

In this example we can see that both speaker gaze and ‘raised eyebrows’ facial display have a specific function in speaker try-marking of the codeswitched lexical item. Previous research focusing on the role of gaze in social interaction has shown that participants can use gaze to achieve various interactional goals that have previously been ascribed only to syntax or prosody (Rossano, 2012). As with rising intonation, gaze has been observed to have a response-mobilising function in moments when recipient response is due (see, e.g., Kendon, 1967; Stivers & Rossano, 2010), and in this case the precisely timed speaker gaze shift towards the recipient seems to do just that.

Extract 1 also illustrates speaker use of the ‘raised eyebrows’ facial display, which—together with speaker gaze—was found to be one the most recurrent traits of the phenomenon in the data. Mobilised together with the codeswitched word and held until the recipient responds (line 10), the ‘raised eyebrows’ display embodies the trial format of the codeswitched word and prolongs the relevance of recipient response.

The next example further illustrates the combination of rising intonation, gaze, and ‘raised eyebrows’ facial display as well as gestural hold, which can be used as an additional resource in speaker try-marking. Due to a better camera position, in Extract 2 we can consider the embodied conduct of the recipient more thoroughly than in Extract 1. In Extract 2, Finnish participant Aku tells Czech participant Martin about the items that his boss ordered for his office (lines 1–9). Aku engages in a word search as he tries to refer to the last item on the list, a dock station. In line 9, Aku attempts to resolve his lexical difficulty by using a Finnish equivalent to the searched-for word, telakka. The Finnish word is produced here quite promptly, with Aku activating a combination of resources—recipient-focused gaze, gestural hold, rising intonation, and a ‘raised eyebrows’ facial display—in close proximity to the codeswitched word.

Extract 2. Telakka (‘dock station’)

  Open in a separate window

The Finnish word is preceded by the placeholder ‘that’ (line 7) and by a one-second-long pause (line 8) during which the speaker mobilises several embodied cues that signal his lexical difficulty to the recipient. Aku, at first, makes an iconic hand gesture using his index fingers to illustrate the ‘box-like’ shape of the item (line 7, Figure 3). While performing the ‘box’ gesture, Aku’s gaze is averted from Martin and he employs a ‘frowning’ facial display resembling a ‘thinking face’, which indicates that at this point Aku’s search is a solitary one (Goodwin & Goodwin, 1986; Clark, 1996). Subsequently, Aku adjusts his iconic gesture (‘parallel palms’, line 8, Figure 4), shifts his gaze back to Martin, and produces the Finnish word for dock station, telakka (line 9). In Figure 5, we can see that Aku holds the hand gesture in a post-stroke position as he utters the codeswitched word with rising final intonation, thus marking it as a trial. A brief pause ensues, during which Aku employs a ‘raised eyebrows’ facial display and monitors the recipient for response (line 10, Figure 6).

As the recipient does not immediately respond, Aku makes his unavailability of the English equivalent public (‘I don’t know what is it’, line 11). He retracts the gestural hold and starts a new gesture (‘put on’ gesture, Figure 7), one that projects an upcoming verbal description of the item now referred to with the placeholder ‘that thingy’ (end of line 11). Aku’s continuation of a word search seems to be prompted by the recipient’s embodied conduct; at the end of line 11 Martin employs a ‘downward mouth’ facial display (Figure 7 and 8), thus visibly showing that he is not familiar with the Finnish word. Aku describes the item as a ‘thingy where you put the laptop’ while gesturally enacting the action of putting the laptop on the dock station by first placing the left and then the right palm on the table (lines 11–14, Figure 7). Following Aku’s multimodal description, Martin soon displays a change-of-state with ‘oh yeah yeah’ (Heritage, 1984) and offers a candidate solution ‘dock’ (line 13), which is subsequently confirmed and adjusted by Aku to ‘dock station’ (line 15).

As in Extract 1, in this example we can also see that the speaker uses both rising intonation and embodied resources to mark the codeswitched lexical item as a trial and to invite recipient response. Unlike in Extract 1, where the ‘raised eyebrows’ facial display was already mobilised with the production of the codeswitched word, in this case it immediately follows the prosodically marked Finnish word and, together with the recipient-focused speaker gaze, highlights the relevance of recipient response as a next action.

Extract 2 also illustrates the use of gestural hold in speaker try-marking. Previous research on gestural holds has shown that speakers tend to hold the gesture in order to visually display their orientation to interactional trouble that has not yet been resolved (Floyd et al., 2016; Matsumoto & Canagarajah, 2020). Here, Aku employs the ‘parallel palm’ gesture as he produces the prosodically marked Finnish word and holds the gesture while he monitors the recipient for responsive action. While the gestural hold is typically maintained until the problem is solved (Sikveland & Ogden, 2012), in this case Aku retracts the gestural hold (and starts a new gesture) once he realises that the Finnish word is not a suitable solution to his lexical trouble.

The use of a gestural hold in speaker try-marking can also be observed in the following example. In Extract 3, Jenni is telling Tereza and Martin about the cave paintings she recently went to see. Extract 3 starts with Jenni responding to Tereza’s question about whether the caves were part of the museum.

Extract 3. Jääkausi (‘ice age’)

  Open in a separate window

In her response Jenni clarifies that the paintings were not actually in a cave but on a rock near a lake (lines 1–5). She also points out that the paintings were located very high up (line 7), because back then the water level was much higher (lines 8 and 10). At this point Jenni’s verbal conduct is accompanied by gestural movement; she lifts her right arm up (end of line 7) and subsequently lowers and raises the arm to depict the difference in the water level (lines 8–10). In line 12 Jenni starts to explain why the water level was higher before; however, her turn gets interrupted by a longer pause (line 13) and lexical perturbations (line 14). The fact that Jenni encounters lexical difficulty is also made recognisable to the co-participants through her embodied conduct. Jenni suspends the previously mobilised gestural movement and keeps her right arm up in a post-stroke hold, thus displaying orientation to the ongoing trouble that has not yet been resolved (Floyd, 2016). At the end of line 14, Jenni employs a ‘frowning’ facial display, averts her gaze from Martin, and subsequently produces a self-directed question ‘what’s the word’ (line 16), thus making her unavailability of the word public (Brouwer, 2003).

Both Martin and Tereza keep looking at Jenni as she attempts to resolve the word search with the Finnish equivalent to the searched-for word, jääkausi (‘ice age’, line 17). As Jenni produces the Finnish word with rising final intonation, she shifts her gaze back to Martin, thus selecting him as the next speaker (Lerner, 2003), and employs a facial display with raised eyebrows while still maintaining a gestural hold (Figure 9). A brief pause follows (line 18), during which Jenni shifts her gaze from Martin to Tereza. Unfortunately, at this point we cannot see whether the ‘raised eyebrows’ facial display is still on, as Jenni’s face is turned away from the camera. However, the fact that Jenni establishes mutual gaze with both Martin and Tereza shows that she finds the response from them both to be relevant. Indeed, both Tereza and Martin treat Jenni’s multimodal conduct as having a particular function—inviting confirmation of understanding—as they subsequently respond with confirmative tokens (line 19 and 20).

The sequence, however, does not end with the recipients’ confirmation. In line 21 Tereza produces an English translation of the Finnish word (‘ice age’), one that is directed at her partner, Martin (see Figure 10 for Tereza’s embodied conduct). The fact that Tereza translates the target word for Martin suggests that she is not fully convinced that Martin understood the Finnish word even though he claimed so. Tereza’s translation is acknowledged by both co-participants; while Martin reacts with continuous nodding, Jenni immediately repeats the English translation, thus accepting it as a candidate solution (line 22). It is also worth noting that the translation of the Finnish word does not end Jenni’s explanation of why the paintings were located high up on the rock. Clearly, the paintings were not high up because of the ice age, but because of the water level decreasing slowly as the ice melted. The real reason why the paintings were high up thus becomes clear only once Jenni’s explanation is brought to an end (in lines 24–27). This supports the claim that Tereza has indeed understood the Finnish word instead of guessing at the concept.

As with Extract 2, Extract 3 also illustrates the way in which gestural hold is a relevant and meaningful resource for the participants in the course of a word search. As the word search is initiated, the freezing of the hand movement signals to the recipients that Jenni’s turn is put on hold until the trouble is resolved (Deppermann & Streeck, 2018). Furthermore, in coordination with rising intonation, speaker gaze, and a ‘raised eyebrows’ facial display (line 17, Figure 9), the continuous gestural hold becomes a part of a complex multimodal Gestalt designed to invite recipient confirmation of understanding of the Finnish word that is produced by the speaker as a provisional solution. Once the trouble is resolved, Jenni starts to release the gestural hold (line 23, Figure 11). The same gestural movement is then resumed as it continues to be relevant for the subsequent talk (lines 24–27).

To sum up, the examples 1–3 highlight the multimodal features of try-marking; in addition to rising intonation, speakers rely on gaze, ‘raised eyebrows’ facial display, and gestural hold to prompt recipient response. Mobilised within a word search, try-marking enables speakers to propose the codeswitched word as a provisional solution to be further negotiated by the participants. Whether the codeswitched word is treated by the participants as an adequate solution or whether the search continues appears to depend on several factors such as the ‘larger’ action that the put-on-hold turn is designed to fulfil (e.g., ‘expressing an opinion’ in Extract 1), as well as on recipients’ responses (which can also be delayed or absent), and the way they are being treated by the speakers (or by other participants) as sufficient (or not) indicators of understanding.

4.2 Try-marking without rising intonation: Inviting recipient confirmation with end positioning and embodied resources

In this section I examine one more example to show that speakers can also invite recipient confirmation of understanding without rising intonation by relying mainly on the end positioning of the target word near the transition relevance place in combination with the embodied resources that we have observed in Section 4.1. This last example also shows that try-marking does not necessarily have to be mobilised within a word search, but speakers can mark the codeswitched item as a trial to simply prompt recipient confirmation when the shared understanding is in doubt. Extract 4 further illustrates that when the pursued response is not delivered in time, speakers may treat recipient lack of response as a display of non-understanding and resort to self-repair.

Previously, Czech participant Martin was telling Finnish participant Aku about a fishing trip he was at with his friend. In his telling of the story, Martin switched to his second language (Finnish) to name a fish that he caught for the first time, hauki ‘pike’, and he also used a Finnish name to refer to the fish his friend likes to eat, ahven ‘perch’. Martin’s switches to Finnish were smooth: without hesitations and without any signs of verbal or embodied marking, which suggests that these Finnish fish names are part of his multilingual repertoire. In Extract 4 the conversation continues. Here, it is Finnish participant Aku who finishes his own story about a pike his brother caught, and which Aku saw only in the photos (lines 1–2). In lines 5–7, Aku continues with naming the fishes that one can find in his local river and lake.

Extract 4. Lahna (‘bream’)

  Open in a separate window

Both syntactically and prosodically, Aku’s turn in lines 5–7 can be considered a complete unit ending on a last item in a three-part structured list construction (Jefferson, 1990). Aku’s listing starts with a reference to two fishes: hauki ‘pike’ and ahven ‘perch’. These Finnish fish names have previously been used by Martin in his storytelling and Aku’s smooth codeswitching (i.e., without any signs of hesitancy or try-marking) displays his categorisation of the recipient as being acquainted with these Finnish fish names. The last fish on the list (lahna ‘bream’), however, has not yet been mentioned in the previous conversation and it is flagged with both verbal and embodied means. As Aku utters a lengthened adverb ‘then::’ (line 7) that delays the production of the next item in the list, he also employs a specific embodied display; with his gaze averted from the recipient, he narrows his eyes and subsequently puts on hold the previously employed ‘listing’ gesture (Figure 12). With this multimodal conduct, Aku makes it recognisable for the recipient that he is ‘doing thinking’ and the listing activity is still in progress.

Aku refers to the last fish on the list—lahna (‘bream’)—at the end of line 7. In comparison to Extracts 1–3, in this case the codeswitched word is not delivered with rising final intonation; however, it is prosodically marked with clear pronunciation and a distinctive emphasis on the first syllable. The target word is produced with a falling final pitch that signals a list completion (Selting, 2007) and a point of a possible unit completion, which makes some response from Martin a relevant next action (Jefferson, 1990). Aku’s embodied conduct invites a display of recipiency as well; in addition to prolonged gestural hold, Aku shifts his gaze towards Martin and raises his eyebrows (end of line 7, Figure 13). The multimodal display comprising recipient-focused gaze, gestural hold, and a ‘raised eyebrows’ facial display (Figure 13) is held by Aku throughout the transition space and indicates a continuous relevance of recipient response (see, e.g., Goodwin & Goodwin, 1986; Sikveland & Ogden, 2012; Streeck, 2009 on response-mobilising functions of gaze and gestures).

During the pause in line 8 Martin does not respond even though Aku seems to wait for him to do so; he does, however, withdraw his gaze after 0.8 seconds (Figure 13), which at this sequential position may be seen as an indication of a private search (Kendrick, 2015, p. 8). Following Martin’s lack of responsive action, Aku expands his TCU by specifying that ‘lahna is that cousin of kapor’ (line 9), meaning that the bream belongs to the same family as the carp (kapor in Slovak). Aku’s turn expansion in line 9 constitutes a self-repair, and it shows us that he treats Martin’s lack of response as a signal of non-understanding and categorises him as not being acquainted with the last Finnish fish name in the list. It is also worth noting that in his specification in line 9, Aku draws on his knowledge of Slovak—a language closely related to Martin’s native language, Czech—which displays Aku’s effort to accommodate to Martin’s linguistic knowledge by switching to another partially shared language (Cogo, 2009).

As Aku continues his turn in line 9, he resumes the previously suspended gestural movement (he widens his fingers and then retracts them) and, following Martin’s silent acknowledgement token produced in overlap (line 10), gradually revokes the ‘raised eyebrows’ facial display. Once the repair is produced, Aku keeps monitoring Martin’s multimodal conduct, which indicates that he may expect a more distinct confirmation of understanding from Martin. With Martin’s confirmative token (line 12) being slightly delayed, Aku continues the repair by specifying that lahna ‘looks like kapor but is smaller’ (lines 13 and 14), after which the repair sequence is finally brought to an end with Martin’s verbal (‘okay’) and embodied (nodding) claim of understanding.

In this example we can see that the speaker uses several resources to frame the last fish in the list as a marked constituent. The target word is preceded by a hesitation (prolongation) and a ‘thinking’ facial display resembling a word search; however, the participants do not seem to treat it as such. Even though we can observe some prosodic marking (emphasis, precise pronunciation) in comparison to Extracts 1–3, the target word is not delivered with rising final intonation. In this case, however, the positioning of the target word at the end of a possibly complete unit becomes a resource by itself. According to Auer (1984, p. 641), speakers may conveniently place the potentially problematic referential expressions at the end of their TCU to prompt recipient response, as the following transition relevance place offers a higher chance that the referent will be acknowledged by the recipient. Moreover, in addition to end positioning, the speaker here also mobilises specific embodied cues—recipient-focused gaze, a ‘raised eyebrows’ facial display, and gestural hold—that highlight and prolong the relevance of recipient response as a next action.

This last example is also interesting as compared to Extract 2, where the target word is also produced as the last item in list construction. In Extract 2, the Finnish word (telakka) is accompanied by both embodied cues and rising intonation; however, the ensuing response relevance place is very brief (Extract 2, line 10). Following the recipient’s lack of an immediate response, the speaker very promptly resumes the word search (Extract 2, line 11), thus displaying his orientation to the codeswitched word as being too difficult for the recipient. In Extract 4, the rising intonation is missing; however, the multimodal display (illustrated in Figure 13) is held for a rather long period of time as the speaker awaits the recipient’s response. The fact that the speaker does not repair the Finnish word immediately may indicate his orientation to the target word as being potentially part of the linguistic repertoire of the recipient, who has previously used Finnish fish names himself. As the recipient, however, does not produce confirmative response, the speaker resorts to self-repair, thus displaying his categorisation of the recipient as not being familiar with the target word.

To recap, this last example shows that prosodic marking of the codeswitched item with rising intonation is not always necessary; speakers can also rely on more subtle prosodic and embodied cues as well as the end positioning of the target word to invite recipient confirmation. In the following, I summarise the findings of the present study.

5. Conclusion

In this study I explored how participants in everyday face-to-face English as lingua franca conversations use try-marking to secure recipient understanding when they codeswitch single lexical items from their native language. The analysis showed that try-marking can be used by the speakers to propose a codeswitched word as a provisional solution to a word search (Section 4.1) or to ‘simply’ invite recipient confirmation when the shared understanding is in doubt (Section 4.2).

In line with previous findings, rising final intonation has been observed in most cases to prosodically mark the codeswitched item as a trial; however, the examination of the video data also revealed that prosodic marking is systematically accompanied by embodied resources that are of no less importance. The presented analysis highlights three particular embodied features that were recurrently observed in speaker try-marking: recipient-focused speaker gaze, ‘raised eyebrows’ facial display, and gestural hold.

Speakers were found to shift their gaze towards the recipient(s) either right before or together with the production of the codeswitched item to prompt responsive action. Besides having a response-mobilising function, recipient-focused gaze also allows the speakers to continuously monitor recipient embodied conduct, which helps to determine whether the codeswitched word is a suitable solution. In the data speakers also frequently employed a ‘raised eyebrows’ facial display—typically mobilised either together with or right after the codeswitched item—that seems to work as a visible equivalent to rising prosody; it embodies the ‘try-out’ nature of the word and serves as a visual display of speaker uncertainty about the adequacy of their linguistic choice.

In addition to gaze and ‘raised eyebrows’ facial display, Finnish participants also utilised previously employed gestural movements. By maintaining a gestural hold, participants show that their turn at talk is momentarily on hold (Deppermann & Streeck, 2018) until the suitability of the codeswitched item is determined. Moreover, the suspension of gestural movement together with recipient-focused gaze and raised eyebrows comprises a multimodal display, one that can be held by the speaker through the response relevance place.

The analysis also showed that there might be variations in the ways verbal and embodied resources are mobilised and combined in try-marking, and these may vary even more in different settings and contexts. This raises a question of how different resources relate to each other, and whether different combinations of multimodal resources in try-marking may display various levels of speaker orientations to the recipient’s linguistic competence. To address this question, future studies also should take into consideration the variability in lexico-syntactic flagging surrounding the switch, which can, as well, signal speaker assumptions about the appropriateness of the codeswitched item (see Hynninen et al., 2017).

The present findings contribute to pragmatic ELF research, noting that despite their linguistic and cultural diversity, ELF speakers are rather skilled at accommodating these differences by using proactive means (such as repetition, paraphrasing, and confirmation checks) to avoid problems of understanding (e.g., Cogo, 2009; Kaur, 2009, 2012; Mauranen, 2006). My findings are in concordance with this line of research; they show that participants proactively use try-marking to index specific codeswitched items (that are often specialised or culture-specific words) as being prone to be problematic for selected co-participant(s) and thus to pre-empt interactional trouble that could be caused by the language choice in their word selection.

The microanalytic investigation presented in this study adds to our knowledge of how effective communication is locally achieved by participants in linguistically diverse settings; we can see that even though participants may not always be certain about which multilingual resources are shared and which are not, they do not need to refrain from using them as they can adopt universal techniques to deal with the challenges that limited common ground brings along.

This study also stresses the benefit of incorporating the analysis of participants’ embodied conduct into the exploration of ELF data. As we can see, embodied resources have a significant role in the mutual achievement of understanding between participants. While rising intonation clearly marks the codeswitched item as a trial and prompts recipient response, it is by nature a vocal phenomenon that dissolves right after it is produced. The concurrent embodied resources, however, can last longer in time and thus have the ability to prolong the relevance of recipient response beyond prosodic marking (see, e.g., Deppermann & Streeck, 2018, on the temporality of multimodal conduct). Furthermore, as we can see in Extract 4, the codeswitched item does not necessarily have to be marked with rising intonation, but speakers can rely primarily on the end positioning of the target word and on specific embodied cues held beyond their turn to invite recipient response.

Moreover, the visual conduct of the co-participants is of no less importance; while recipient confirmation of understanding typically comprises a confirmative token and/or nodding, recipient non-understanding is often indicated only visually—for example, with a ‘downward mouth’ display (Extract 2) or with gaze aversion (Extract 4). If the analysis drew merely upon audio data, these embodied displays that may prompt speaker’s repair could easily be omitted. Therefore, by taking into consideration not only the verbal features of talk, but also participants’ embodied conduct, we can attain a more holistic knowledge of how understanding is locally achieved by participants in linguistically diverse ELF contexts.


This study was conducted as a part of the Academy of Finland project Linguistic and Bodily Involvement in Multicultural Interactions. I would like to thank my Ph.D. supervisors, Maria Frick, Florence Oloff, and Ad Backus, for their insightful comments to previous versions of this article. I also thank the Emil Aaltonen Foundation and the University of Oulu Scholarship Fund for providing funding for this research.


Auer, P. (1984). Referential Problems in Conversation. Journal of Pragmatics, 8(5–6), 627– 648.

Brouwer, C. E. (2003). Word Searches in NNS–NS Interaction: Opportunities for Language Learning? Modern Language Journal, 87(4), 534–545

Clark, H. (1996). Using Language. Cambridge University Press.

Cogo, A. (2009). Accommodating difference in ELF conversations: A study of pragmatic strategies. In A. Mauranen & E. Ranta (Eds.), English as a lingua franca: Studies and findings (pp. 254–273). Cambridge Scholars Publishing.

Deppermann, A. (2013). Multimodal interaction from a conversation analytic perspective. Journal of Pragmatics, 46(1), 1–7.

Deppermann, A. & Streeck, J. (2018). The body in interaction: Its multiple modalities and temporalities. In A. Deppermann & J. Streeck (Eds.), Time in Embodied Interaction: Synchronicity and Sequentiality of Multimodal Resources (pp. 1–29). John Benjamins.

Duran, D., Kurhila, S., & Sert, O. (2019). Word Search Sequences in Teacher-Student Interaction in an English as Medium of Instruction Context. International Journal of Bilingual Education and Bilingualism, 1–20.

Eskildsen, S. W. (2011). The L2 inventory in action: Conversation analysis and usage-based linguistics in SLA. In G. Pallotti & J. Wagner (Eds.), L2 learning as social practice: Conversation-analytic perspectives (pp. 327–364). National Foreign Language Resource Center.

Eskildsen, S. W. (2018). ‘We're Learning a Lot of New Words’: Encountering New L2 Vocabulary Outside of Class. Modern Language Journal, 102(Supplement), 46–63.

Firth, A. (1990). ‘Lingua franca’ negotiations: towards an interactional approach. World Englishes, 9, 269–280.

Firth, A. (1996). The discursive accomplishment of normality: On ‘lingua franca’ English and conversation analysis. Journal of Pragmatics, 26, 237–59.

Floyd, S., Manrique, E., Rossi, G., & Torreira, F. (2016). Timing of visual bodily behavior in repair sequences: evidence from three languages. Discourse Processes, 53(3), 175–204.

Goodwin, M. H. & Goodwin, C. (1986). Gesture and Co-participation in the Activity of Searching for a Word. Semiotica, 62, 51–75.

Greer, T. (2013). Word search sequences in bilingual interaction: Codeswitching and embodied orientation toward shifting participant constellations. Journal of Pragmatics, 57, 100–117.

Heritage, J. (1984). A change-of-state token and aspects of its deployment. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action: Studies in Conversation Analysis (pp. 299–345). Cambridge University Press.

Heritage, J. (2007). Intersubjectivity and progressivity in person (and place) reference. In N. J. Enfield & T. Stivers (Eds.), Person reference in interaction (pp. 255–280). Cambridge University Press.

Hülmbauer, C. & Seidlhofer, B. (2013). English as a lingua franca in European multilingualism. In A.-C. Berthoud, F. Grin, & G. Lüdi (Eds.), DYLAN: Exploring the dynamics of multilingualism (pp. 387–406). John Benjamins.

Hynninen, N., Pietikäinen, K., & Vetchinnikova, S. (2017). Multilingualism in English as a lingua franca: Flagging as an indicator of perceived acceptability and intelligibility. In A. Nurmi, P. Pahta, & T. Rütten (Eds.), Challenging the Myth of Monolingual Corpora (pp. 193–212). Brill Rodopi.

Jefferson, G. (1990). List construction as a task and interactional resource. In: G. Psathas, (Ed.), Interactional Competence (pp. 63–92). University Press of America.

Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner (Ed.), Conversation Analysis: Studies from the First Generation (pp. 3–31). John Benjamins Publishing.

Kalocsai, K. (2013). Communities of Practice and English as a Lingua Franca: A Study of Students in a Central European Context. De Gruyter Mouton.

Kaur, J. (2009). Pre-empting problems of understanding in English as a lingua franca. In A. Mauranen & E. Ranta (Eds.), English as a Lingua Franca: Studies and Findings (pp. 107–123). Cambridge Scholars Publishing.

Kaur, J. (2012). Saying it again: Enhancing clarity in English as a lingua franca (ELF) talk through self-repetition. Text & Talk, 32(5), 593–613.

Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63.

Kendrick, K. H. (2015). The intersection of turn-taking and repair: the timing of other-initiations of repair in conversation. Frontiers in Psychology, 6(98), 1–16.

Koshik, I. & Seo, M.-S. (2012). Word (and other) search sequences initiated by language learners. Text & Talk, 32(2), 167–189.

Kotani, M. (2017). Initiating side-sequenced vocabulary lessons: Asymmetry of linguistic knowledge and opportunities for learning in conversation. Pragmatics and Society, 8(2), 254–280.

Kurhila, S. (2006). Second Language Interaction. John Benjamins.

Matsumoto, Y. & Canagarajah, S. (2020). The use of gesture, gesture hold, and gaze in trouble-in-talk among multilingual interlocutors in an English as a lingua franca context. Journal of Pragmatics, 169, 245–267.

Mauranen, A. (2006). Signaling and preventing misunderstanding in English as lingua franca communication. International Journal of the Sociology of Language, 177, 123–150.

Mauranen, A. (2013). Lingua franca discourse in academic contexts: Shaped by complexity. In J. Flowerdew (Ed.), Discourse in context (pp. 225–245). Bloomsbury.

Mondada, L. (2013). Conversation analysis: Talk and bodily resources for the organization of social interaction. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill, & S. Tessendorf (Eds.), Body – Language – Communication, vol. 1 (pp. 218–226). De Gruyter Mouton.

Mondada, L. (2014a). The local constitution of multimodal resources for social interaction. Journal of Pragmatics, 65, 137–156.

Mondada, L. (2014b). Conventions for multimodal transcription.
Retrieved from: (accessed 17 January 2021).

Pekarek Doehler, S. & Berger, E. (2019). On the reflexive relation between developing L2 interactional competence and evolving social relationships: a longitudinal study of word-searches in the ‘wild’. In J. Hellerman, S. W. Eskildsen, S. Pekarek Doehler, & A. Piirainen-Marsh (Eds.), Conversation Analytic Research on Learning-in-Action: The Complex Ecology of Second Language Interaction ‘in the Wild’ (pp. 51–75). Springer.

Poplack, S. (1987) [1985]. Contrasting patterns of code-switching in two communities. In E. Wande, J. Anward, B. Nordberg, L. Steensland, & M. Thelander (Eds.), Aspects of multilingualism (pp. 51–77). Borgströms.

Poplack, S. (2004). Code-Switching. In U. Ammon, K. Mattheier, & P. Trudgill (Eds.), Sociolinguistics/Soziolinguistik: An international handbook of the science of language (pp. 589–596). Walter de Gruyter.

Rossano, F. (2012). Gaze in conversation. In J. Sidnell & T. Stivers (Eds.), The Handbook of Conversation Analysis (pp. 308–329). Wiley-Blackwell.

Sacks, H. & Schegloff, E. A. (1979). Two preferences in the organization of reference to persons and their interaction. In G. Psathas (Ed.), Everyday language: Studies in ethnomethodology (pp. 15–21). Irvington.

Selting, M. (2007). Lists as embedded structures and the prosody of list construction as an interactional resource. Journal of Pragmatics, 39(3), 483–526.

Schegloff, E. A. (1996). Some practices for referring to persons in talk-in-interaction: A partial sketch of a systematics. In B. Fox (Ed.), Studies in anaphora (pp. 437–85). Amsterdam: John Benjamins.

Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation analysis, vol. 1. Cambridge University Press.

Schegloff, E. A., Jefferson, G., & Sacks, H. (1977). The preference for self-correction in the organization of repair in conversation. Language, 53(2), 361–382.

Sikveland, R. O. & Ogden, R. A. (2012). Holding gestures across turns: Moments to generate shared understanding. Gesture, 12(2), 166–199.

Stivers, T. & Rossano, F. (2010). Mobilizing response. Research on Language and Social Interaction, 43(1), 3–31.

Suomi, K., Toivanen, J., & Ylitalo, R. (2008). Finnish Sound Structure: Phonetics, phonology, phonotactics and prosody. University of Oulu.

Svennevig, J. (2010). Pre-empting reference problems in conversation. Language in Society, 39(2), 173–202.

Appendix - Transcription convensions


Talk has been transcribed according to the conventions developed by Gail Jefferson (2004).

Embodied actions

Embodied details have been transcribed according to the following transcription conventions developed by Lorenza Mondada (2014b):

* * two identical symbols delimit embodied action of a specific participant
*--> embodied action continues across subsequent lines
-->* until the same symbol is reached or
*---->> continues beyond the end of excerpt
>>* action takes place already before the beginning of the excerpt
……. preparation phase of delimited action
------ stroke/hold of delimited action
,,,,,,, retraction phase of delimited action
// transcription of participant’s embodied actions ends at this point
fig screenshot
# refers to exact time, at which the screenshot was taken