Gestural Matching and Contingent Teaching:
Highlighting Learnables in Table-talk at Language Cafés

Ali Reza Majlesi

Stockholm University


This study explores how iconic gestural matching in responsive turns is used in Swedish L2 talk not only to demonstrate understanding, but also as a referable by L1 speakers to highlight part of the previous turn as the focus of instruction. Gestural matching indicates intersubjective understanding (de Fornel, 1992), cohesion across turns (Koschmann & LeBaron, 2002), or conjoint actions (Lerner, 2002). In the context of L2 learning, they can be used for pedagogical purposes (Eskildsen & Wagner, 2015; Majlesi, 2015). This study reports on the use of gestural matching in “language cafés”, where gestures are recycled to make part of the prior turn salient and relevant for the emergence of learnables and for contingent teaching.

Keywords: matching gestures, contingent teaching, Swedish as a second language, CA-SLA, embodiment

1. Introduction

This study deals with gestures and their matchings in subsequent turns at talk for pedagogical purposes. The paper focuses specifically on how an L1-speaking recipient of an L2 speaker’s turn recycles the gestural part of the previous turn (also along with the verbal part) to accomplish two distinctive practices subserving the instructional activities of teaching and learning a language: To display (or check) the understanding of the action, and to build an “epistemic ecology” through “highlighting” (Goodwin, 2018) of particular features of the action (Goodwin, 2018) as “preservable features” of the current interaction (see Sacks, 1995: 773). Gestural matching, depending on its sequential positioning, has been demonstrated as the indication of acknowledging and displaying attention to a visible action (e.g., in “return gestures” in de Fornel, 1992), demonstrating cohesion across turns for mutual understanding (Koschmann & LeBaron, 2002), building conjoint actions (Lerner, 2002), or being a resource for tying back to the previous turn for instructional purposes (Arnold, 2012, Majlesi, 2015).

I have chosen to study the use of gestural matchings in table-talks in a couple of video recordings in “language cafés”.1 The setting is not a formal language learning environment compared to curriculum-based schools or institutions, but a hybrid between formally organized language learning activities and informal everyday interactions. In other words, conversations at the café may be considered in a cline of out-of-classroom environments for language learning “in the wild” (cf. “the cline of wildness”, Eskildsen et al., 2019). Language cafés in Sweden are organized by voluntary organizations such as the Red Cross, as well as churches and public libraries, and are arenas in which Swedish-speaking volunteers help L2-speaking visitors to practice (and learn) Swedish. The data consists of word-search or word-explanation sequences initiated by L2-speaking visitors directed to L1-speaking volunteers in which gestures are used not only to accomplish mutual understanding, but also to display joint attention on a sought-for linguistic item as a pedagogical focus, thus turning the interaction into an instructional project (cf. Communicative Project in Linell, 2009; and Interactional Project in Schegloff, 2007; Levinson, 2013). Gestures used in these environments are of different types. In this study, I focus on iconic gestures, i.e., gestures that resemble the shape and/or content of the referents they depict (e.g., Ekman & Friesen, 1969).

In the dataset for this study, I have used the criterion to collect only sequences in which the use of gestural matchings leads to an explicit talk about a linguistic form and meaning. I excluded the sequences which remain at the level of understanding display or check without explicit topicalization of linguistic elements in a new or side sequence (Jefferson, 1972), even though they may be associated with linguistic elements embedded in the responses (e.g., “embedded corrections”, see Jefferson, 1987). Two types of word-search and word-explanation activities have been found among the frequent activities in which gestures and their matchings are used in talk-in-interaction. In both types of activities, L2 speakers’ initiative for learning contingently evolves into a teaching moment in L1 speakers' response (and also through the expansion of the sequences; for previous studies on word search, see Brouwer, 2003; Eskildsen, 2019; Koshik & Seo, 2012; Kurhila, 2006; Pekarek Doehler & Berger, 2019; Skogmyr Marian & Pekarek Doehler 2022/this issue). By drawing on ethnomethodology and conversation analysis (EMCA) and through the detailed sequential analysis of the design of actions (see also Piirainen-Marsh, Lilja & Eskildsen, 2022/this issue), this study aims to show how an L1 speaker's gestural matching contributes to foreground some salient features of an L2 speaker’s turn (mirrored also in L1 speaker’s response) as noticeable situated practices to be highlighted also as interactionally shared and locally achieved pedagogical foci, something that may be called learnables (Majlesi & Broth, 2012; cf. Zemel & Koschman, 2014).

2. Gesture and gestural matchings in instructional and teaching activities

Studies on gestures in interaction have shown how gestures and other bodily actions are used as pivotal resources in instructions in a range of contexts, including archeology where pointing is used to reorganize the domain of scrutiny in the archeological field for instructional purposes (Goodwin, 2003), music instruction where gesture is used, e.g., in the instruction of handling a bow in a violin lesson (Nishizaka, 2006), or in vocal master class to make directives or coordination with the participants (Szczepek Reed et al., 2013, cf. Weeks, 1996), dance lectures (Keevallik, 2010), repair instructions for bicycles (Arnold, 2012), crocheting lessons (Lindwall & Ekström, 2012), dental specialty lessons (Lindwall & Lymer, 2014), and basketball training sessions (Evans & Reynolds, 2016).

A specific focus on gestural matching can be found in de Fornel’s (1992) study of recycling iconic gestures in talk-in-interaction (“return gesture” in his study). The reuse of iconic gestures by the second speaker (which were previously used by the first speaker), according to de Fornel, has “a retro-active communicative value”: Not only does it display “the exigencies of the production of the current turn”, it is also “linked to the exigencies of reacting in an adequate way to the talk of the preceding speaker” (p. 169). Although the study concentrates only on “iconic gestures”, the findings of the study, in terms of how gestural matching is used to acknowledge and display attention to the speaker’s action, as well as to build the contextual premise for inference regarding the produced action, are not restricted to iconic gestures and their recycling (cf. Heath, 1992).

Lerner (2002) focuses on turn-sharing as a practice in which the recipient may use matching utterances or matching gestures as entry device into someone else’s turn to co-produce the turn either for the sole reason of taking the floor or contributing to an ongoing action (cf. M. Goodwin, 1980 for the impact of recipients’ embodied action on an ongoing talk). Lerner (2002) describes a sequential environment for “simultaneously voiced actions” (p. 225) — at least as part of a turn — not only through matching verbal production of a turn-constructional unit (TCU), including features such as the tempo of the TCU, but also through the reuse of “closely synchronized gestures as a feature of recognizable actions” (p. 253).

In the context of teaching and learning, Koschmann and LeBaron (2002), Arnold (2012), Eskildsen and Wagner (2013; 2015), Majlesi (2015), and Majlesi and Markee (2018) study gestural matchings in instructional sequences. Koschmann and LeBaron (2002) study the use of gestures by learners in problem-based learning meetings and just how gestures and their recycling could be part of jointing in learner’s articulation (p.271), thus resulting in considering gestures as cohesive devices used across turns for mutual understanding. In another study, in instructional interactions in a bicycle-repair shop, Arnold (2012) similarly shows how gestures and their recycling in what she calls “dialogical embodied actions” are used both for tying turns as well as intersubjective engagement.

Eskildsen and Wagner (2013; 2015), Majlesi (2015), and Majlesi and Markee (2018) discuss the use of gestural matching in language classrooms. Eskildsen and Wagner (2013) show that gestural matchings are used by participants in language learning situations for maintaining understanding (as also shown by de Fornel’s return gesture, 1992), particularly to help with language troubles in talk for example in word searches or correcting lexical errors (e.g., by a recalling practice). They also demonstrate in a later study (Eskildsen & Wagner, 2015) how return gestures are used in language learners’ developmental trajectories integrating new lexical items into their second language repertoire (also see learning through embodied actions in a shorter span of interaction in Kotilainen & Kurhila, 2020; see also aus der Wieschen & Eskildsen, 2019; and Majlesi and Markee, 2018). Focusing on the use of gestural matchings can also be found in Majlesi (2015) who demonstrates how gestural matchings are used by the teachers following their students both “for maintaining mutual understanding” and “for creating teaching and learning opportunities” (p. 42).

In line with the previous studies, the current study, with the focus on sequences that are learner-initiated and vocabulary-oriented, highlights a similar practice, albeit in a different setting: in language cafés, i.e., a setting for voluntary activities for migrants in which L1-speaking volunteers are not professional language teachers. In fact, in the dataset used for this study, all of them but one (see Ex.#04) have their backgrounds in other professions.

3. Theory and method

By invoking the concept of learning in a socio-interactional perspective, I am settling on the definition of learning activity as an activity in which selective parts of the environment (see also “selective interests”, James, 1950[1890]) are oriented to as “salient” and “relevant” (see also Gurwitsch, 2010[1957]) as “preservable features” (Sacks, 1995: 773). That is, the features oriented to are not only relevant for the practical purpose of accomplishing the current activity, but they are also singled out to serve for their reusability in the future. The preservable features are those that are referrable/mentionable in turns at talk as they tie back to previous turns or conversations. From an ethnomethodological perspective, “singling out” some features in interaction, like any other members’ practices, are account-able (Garfinkel, 1967); that is, they are accounted for by the occasion of their occurrence as observable-and-reportable situated practices.

In order to analytically work on practices of “singling out”, I draw on the notion of “highlighting” (see Goodwin, 2018), in which the participants re-shape “a domain of scrutiny so that some phenomena are made salient, where others fade into the background” (pp. 388–389). Highlighting may include practices such as the use of “gestural formulations” (e.g., Koschmann et al., 2007), “referential practices” (e.g., Zemel & Koschmann, 2013; see also Hanks, 1991), or “extracting information” from what was previously said, e.g., in (re)formulation practices (see Kunitz & Majlesi, 2022/this issue), or other types of actions such as “pointing”, “inscribing”, “glossing”, etc. as part of “professional practices” (Goodwin, 2018). Gestural matching, in this study, is also considered as a highlighting practice whose details, i.e., the circumstances of their occurrences, are investigated through the multimodal analysis (Goodwin, 2018; Mondada, 2014; 2016) of the sequences of actions in the studied excerpts.

3.1 Data, participants and some ethical considerations

The data presented here are from two different language cafés, organized by two different churches in Sweden. Participants include L1-speaking volunteers and L2 speaking visitors (henceforth L2 speakers), often asylum seekers and also working or student migrants. Volunteers are L1 speakers of Swedish who are often, but not always, elderly, retired citizens; only one (Babette, in Ex.#03) in the presented data is a former teacher in elementary school. L2 speakers have different linguistic backgrounds with various proficiency levels in Swedish. Language cafés are open arenas to the public, without formal educational plans, organized only for informal encounters and conversations between volunteers and the visitors (even if the organizers may have pre-figured agendas for sessions in terms of conversational topics or information sharing purposes). Conversations are usually formed freely around a topic of the participants’ choice. Vocabulary-related talk and language-related episodes (Swain & Lapkin, 1998) are frequent in language cafés (Kunitz & Jansson, 2020). At times, L2 speakers, who may, in parallel, be learning Swedish in adult schools, bring their tasks to the language cafés to receive help. Either with these explicit agenda for language learning or within the frame of other types of activities, concentrating on and topicalizing language and linguistic elements are part and parcel of L1-L2 encounters in language cafés.

This study is part of a bigger project (see Kunitz & Jansson, 2020) with a total dataset consisting of video recordings from 82 sessions in 12 language cafés (each 1.5 to 2 hours of recordings). The project has been ethically vetted and received permission for data gathering with video recordings. For this particular study, I carefully watched and studied 4 hours of recordings in two language cafés. The data presented in this study comprise four sequences that have been transcribed verbatim using Jefferson’s (2004) transcription conventions and also multimodally annotated following Mondada (e.g., 2016) and Goodwin (2018).

4. Data analysis

The examples presented in this section consist of four sequences. All examples exhibit how the L2 speaker’s embodied action in and through which a linguistic element is made relevant to make an inquiry about is primarily picked up and recycled by an L1 speaker in response; then, the response is also expanded to particularly focus on the linguistic element in question, highlighting the linguistic construct as an instructional focus.

4.1 Example 1

In the following example (Ex.#01), two L2 speakers have brought a cloze test from their language class to receive help from an L1 speaker. The excerpt begins when Nina raises a question about a verb from the cloze test, “knacka” (En: Knock). She pronounces the word without articulating the initial consonant /k/ resembling the pronunciation of English word “knock” with a silent initial letter. However, in Swedish the word “knacka” is pronounced with the initial sound /k/. This leads to a correction sequence in which not only the pronunciation of the word is corrected, but the phrase in which the word is used with its preposition “på” (“knacka på”). The first line begins after Nina has already raised her head from the task, and looks at the L1-speaking volunteer, Ted. In the transcripts, the original turns are in grey and the English translations are in black. The translations and the details of the relevant embodied movements are provided in separate lines. (For the transcription conventions, please see the Appendix.)

Example (1)

Participants: Nina (NIN, visitor), Roya (ROY, visitor), Ted (TED, volunteer)

Nina produces nacka, with try-marked intonation requiring confirmation (Sacks & Schegloff, 1979). After receiving no uptake (line 02), she makes “knocking movements” by bending the knuckle of her index finger and beating in the air (Fig.#01) to depict the meaning-content of the word she seeks while also repeating nacka. (line 03). The knocking movements indicates that Nina is referring to the meaning of “knacka” in Swedish (“to knock” in English) and not “nacka” (which means “to behead”). Following a 0.2-second pause, Ted responds by seeking confirmation that “knacka” is the targeted word (line 05). Nina, still making knocking movements in the air, confirms while at the same time displaying uptake by repeating “knacka” (line 06). Her pronunciation is different, however, as she stresses the first syllable, as opposed to Ted, who stresses the last syllable. Roya continues in overlap (line 07) as she reads knacka på=dörren out loud from the cloze test and knocks on the desk three times. Midway through Roya’s turn and coinciding with her first knock on the desk, Ted turns his gaze to her and repeats “knacka” (line 08). Nina then repeats “knacka”, again stressing the first syllable (line 09). Nina’s misplaced stress and Roya’s imprecise articulation of the phrase are both noticed by Ted as he says “knacka” with stress on the final syllable (line 10) and produces a matching gesture for knocking (Fig. 2) as he knocks once on the desk and repeats “knacka dörren” with hearable stress on “” (line 11). Roya confirms and repeats the phrase (line 12) following which both visitors write in their notebooks.

The gestural matching, although ROY and TED do not perform the knocking in identical ways, has four functions: (a) it visibly displays the connection between the turns (Koschmann & LeBaron, 2003; Arnold, 2012). (b) It provides a noticeable visual and audible referent. (c) The matching gesture secures the co-participants’ attention: Ted produces the gesture right after he pronounces the verb in a clearly articulated way, “knacka”, while bringing his hand down on the table and to the level of Roya’s eyesight as she is still looking down at her own notebook. Hitting loudly on the table draws Roya’s attention to Ted. She raises her head and looks at him (line 10). While now Ted has both Nina’s and Roya’s attention, he repeats the whole phrase and distinctly pronounces each word clearly, “knacka dörren” (En: knock on the door; line 11). (d) The matching gesture also contributes to the multimodal gestalt (Mondada, 2014; De Stefani, 2022/this issue) of the referential practice in and through which the preservable features of the phrase “knacka dörren” are instructionally delivered: While producing the phrase, Ted leans forward toward Roya, opens his hand with the back of the hand toward Roya, and exactly at the moment of uttering “på” with a stress, he strikes his hand slightly forward (line 13; Fig.#03). Ted’s repetition of Roya’s phrase with highlighted features (Goodwin, 2018) in his articulation resembles teacher talk (Cazden, 1986) with a clear and distinct articulation of the talk for pedagogical purposes (see also Kunitz & Majlesi, 2022/this issue). His contribution is immediately receipted through an acknowledgement token of “yeah:” by Roya (line 12), who also repeats the whole phrase once more. Both Nina and Roya, in the end, start filling the blank in their cloze tests using the word “knacka” (line 13).

In sum, the example (Ex.#01) shows how an L2 speaker’s inquiry about a linguistic item becomes recognizable through her embodied action and also is treated as referable in the responsive turns in combination with the matching gestures. The L1 speaker recycles the gesture previously made by the L2 speaker and displays a candidate understanding of the inquiry (the understanding is contingent on the confirmation). The gestural matching is also a referential practice to make the response relevant as an answer to the inquiry which also offers a corrected form/pronunciation to the L2 speakers. As the interaction unfolds, matching gestures are used as resources to point to the preservable and highlight features of the linguistic item in question. The interaction thus contingently evolves into an instructional project (cf. Communicative Project in Linell, 2009; and Interactional Project in Schegloff, 2007; Levinson, 2013) in which the gestural matching (here enacting the gesture of knocking) is used to particularly focus on the use of the lexical item (here “knock”). The matching gestures, thus, contribute to building both the mutual understanding of the topic in hand, and the joint attention to the particularities of the linguistic item as a visually and hearably tangible pedagogical focus (the way the word is pronounced and how it is used in its common collocation, e.g., here “knock on the door”).

4.2 Example 2

In the following sequence (Ex.#02), Ted, the same volunteer in the previous example (Ex.#01), is talking with a group of five L2 speakers, two of whom, Ramin and Jawahir, seem to have a better knowledge of Swedish than the rest, who have recently moved to Sweden. They are all sitting at a table close to each other. The presented sequence is initiated by Ramin (RAM), who asks Ted (TED) about an object whose name he would like to know in Swedish (the object is “cone”). I have chosen to present Example 2 in two parts (Ex.#02a, b): In the first part, the focus of the sequence is on the recognition of the referential practice used by the L2 speaker; the second part is a clear example of a post expansion in which the previous sequence, beginning with a word-search, turns into an instructional work, highlighting the linguistic item for the participants.

Example (2a)

Participants: Ramin (RAM, visitor), Jawahir (JAW, visitor), Laleh (LAL, visitor), Zahara (ZAH, visitor), Susan (SUS, visitor), Ted (TED, volunteer)

The sequence begins with Ramin making a reference to a non-present object that is used in his football training during the warm-up activities (line 01). He begins to refer to the object in plural and describe them as “things which are this tall” (line 02), holding his hands up in the air right after the word “things” to show the shape and height of the object. He stretches his arm over the surface of the table, opens his left hand with the palm downward and the fingers apart to demonstrate that the bottom of the object is flat. He, then, raises his right hand above the left hand to show the height of the object and holds it vertically, with bended fingers that are kept close together showing a circle. The gesture depicts the other end of the object to be round and narrow (Fig.#04a, b). He then changes the position of his right hand and turns his wrist horizontally with all fingers stretched toward the thumb (Fig. #05). With this gesture, which resembles “an inverted pinecone”, he depicts a funnel-like shape. Immediately after the hand gestures are produced by Ramin, Ted withdraws his gaze, looks down, and frowns — a facial expression that is typically associated with the display of thinking (see Goodwin & Goodwin, 1986 for “thinking face” in word searches). Ramin obviously has the knowledge of what the object looks like, is used for, and probably knows its linguistic gloss in his own language (which we can observe later in the transcript). However, his actions are not recognizable by Ted to give any suggestions for the gloss of the lexical item in Swedish. Ramin continues with further description as to how the object is used in his football trainings to make boundaries on the ground (line 03). With the verbal description and gestural depiction as a demonstration practice (cf. Goodwin, 2018: 416; cf. “embodied demonstration” in Lindwall & Ekström, 2012), Ramin now asks directly whether Ted knows what he means (line 04). While Ted is still maintaining his thinking face, he brings up also his hand to his chin, showing that he is deep in his thoughts (line 04), and taking a relatively long pause, 0.6 of a second, he shakes his head and replies, “no” (line 06; Fig.#06).

With Ted’s display of non-understanding (line 06), Ramin continues describing the object to be made of plastic, and, once more, he stretches out his arm, opens his hand downward and puts his fingers on the table, then raises his hand and gradually convenes his fingers so that the fingertips get closer together (lines 7–8; Fig.#07¬–09). This second attempt of description and depiction obviously provides the detailed information of the object’s substance and shape, displayed with an iconic gesture, and leads to Ted’s recognition of the object: “en ko:n?” (En: a cone) (line 09). The gesture obviously is a central part of the preservable features in Ramin’s turn contributing to the recognition of the referent, and securing mutual understanding (cf. Kendrick, 2015; Lerner, 2004; Mondada, 2014; Lilja & Piirainen-Marsh, 2019). Showing understanding the referent, however, is subject to receiving confirmation from the previous speaker (see also Ex.#01). Ramin confirms Ted’s understanding in the subsequent turn (line 10) but mispronounces the word “cone” as he says, “cole exactly.”

Ramin’s uptake, which displays a mishearing of the word’s last consonant (“kol” instead of “kon”; line 10), is immediately corrected by Ted, uttering “no- no co:ne.” (line 11). Ted’s correction is made with a clear pronunciation presented in an elongated vowel “ko:n”, which is also followed by Jawahir’s repetition of the word with the stretched vowel “o” and even consonant “n” in “co:n:e.” (line 12). However, Ramin still mispronounces the word and repeats “co:l:e.” (line 13). An extended embodied correction sequence ensues from this. First, Jawahir, who understood the Swedish word for “cone”, explains the word in Arabic to Zahara and Susan (the exchange is not shown in the transcript). Then, Ted takes up the term again with Ramin (Ex.02b). It is worth knowing that Ramin and Laleh do not speak Arabic and their mother tongue is Farsi, and Ted can speak neither Farsi nor Arabic.

Example (2b)

Participants: Ramin (RAM, visitor), Jawahir (JAW, visitor), Laleh (LAL, visitor), Zahara (ZAH, visitor), Susan (SUS, visitor), Ted (TED, volunteer)

In the continuation of the sequence, Ted now turns to Ramin and starts using a matching gesture similar to that of Ramin’s. This contingently leads to an instructional activity constituted by two steps, first securing mutual understanding about the object (i.e., coming to an agreement about the referent), and then providing the correct gloss of the object in Swedish (i.e., correcting Ramin’s pronunciation of the gloss). In other words, an instructional project (cf. communicative project in Linell, 2009; and Interactional Project in Schegloff, 2007; Levinson, 2013) begins in line 25, which seems to be designed to pursue from Ramin an uptake of the correct term; that is, “cone” (cf. Theodórsdóttir, 2018 for L1 speakers’ pursuit of uptake of repairables from L2 speakers). Ted, thus, begins the first step by recycling the gesture of a cone previously made by Ramin (line 02). He shows, however, a bigger size of a cone as tall as his own upper body and says, “so it goes like this,” (line 25; Fig.#10–11), and at the same time, he horizontally pulls apart his hands slightly angled, palms down and fingers apart. Then, as he produces his talk (line 25), he raises both hands and moves his right hand above and close to the left until his index fingers touch each other. His hand gesture depicts a funnel-shaped figure in the air similar to what Ramin showed earlier (Ex.#02a), although with the apex of the gesture above his own head. He coordinates his talk with his gesture and right when he comes to the apex of the gesture, he ends his turn-constructional unit with “this,” (cf. Clark, 2016 for “indexed depiction”). The last word, “this,” is also produced with a slight rise of intonation, signaling that Ted’s action is a proposal for the recognition of the object (cf. try-marker, Sacks & Schegloff, 1979). The candidate understanding gets an immediate confirmation by Ramin (line 26), who continues with an example of the use of cones where they are put on the grounds in a line so that footballers can zigzag through the course outlined by the cones. He says, “if one puts several of them” — simultaneously showing with his hands how “they”, meaning cones, are placed on the ground — and continues, “you must go with to- with the ball e: e- through” (lines 26–27 and 29). When he starts the second part of his utterance with “måste” (En: must), he begins to move his right-hand zigzag across the table (line 26). The move is immediately responded to by Tod making a matching gesture, putting his right hand close to Ramin’s and moving it in parallel to Ramin’s hand (line 27, Fig.#12). He then confirms their shared understanding while saying “yeah yeah” while still moving his hand parallel to Ramin’s (line 28; Fig.#13).

The moves are observed by other co-participants, who contribute to an ongoing interaction by minimal acknowledgement tokens such as “mm” (line 31) or “aha:” (line 32). After showing mutual understanding of what Ramin is talking about, Ted returns to the item of inquiry and begins to say what the word is called (line 34), but his turn is completed by Ramin; not only verbally by saying, “its name is co:ne”, but also through repeating and matching the gesture Ted had begun in his turn (line 35). Ramin, thus, enters into Ted’s turn, not only through the verbal means, but also through a gesture matching Ted’s, i.e., using his hands to draw a funnel-like shape in the air (Fig.#14; see turn-sharing in Lerner, 2002). Ted nods and confirms the word “co:ne” (line 36), with a similar prolongation of the vowel found in Ramin’s preceding turn (line 35).

The gestures and their matching counterparts in this example (and in all of the examples in this study), in the display of the referent and its use in practice, preserve some visible features of the phenomenon as resources for mutual understanding. The example is witness to the fact that not only the first gestures but also the subsequent matching gestures are used to make the actions more intelligible and recognizable for the recipient. The gestures and matching gestures are thus recipient-designed to exhibit the highlights of the description of the references. It is, thus, a source to display agreement and affiliation with regard to the object both parties refer to (see return gesture in de Fornel, 1992). The subsequent production of gestures in different turns (e.g., lines 7–8 followed by a matching gesture line 25), or the production of gestures in the same turn (lines 26–29 and also lines 34–36), are also resources for jointing the contributions of both parties (for using gestures for jointing and tying the turns, see also Koschmann & LeBaron, 2002; Arnold, 2012; Majlesi, 2015). More importantly, the matching gestures are not just connecting the turns but also interrelating the content of the inquiry (description and the depiction of cone as an object) to the sought form (here the linguistic term “cone”) and thus tying the visible object to a hearable lexical item, which in itself is an emergent learnable in action.

4.3 Example 3

The next example (Ex.#03) is from another language café where an L2 speaker, Yua (YUA) is engaged in a conversation with a volunteer, Babette (BAB), talking about shopping. Babette is an experienced retired teacher. The conversation about shopping began when Babette showed Yua some picture cards that she usually carries about with her to help find topics to discuss with the visitors. Selecting a picture of two wrapped packages (Fig.#16), Yua starts her inquiry of how she can ask a sales clerk to wrap a gift. The sequence is presented in two parts, Ex.#03a and Ex.#03b:

Example (3a)

Participants: Yua (YUA, visitor), Babette (BAB, volunteer)

The sequence begins with Yua’s attempt to make a full utterance, starting with an incomplete dependent clause, “when I buy”. At this time, she begins by doing a word search while clearly displaying she is thinking (see Goodwin & Goodwin, 1986). She averts her gaze and goes silent for 1.5 seconds, while holding out her hands in a half-circle as if holding a bowl (line 01; Fig.#16). With a long pause and no continuation, Babette enters into Yua’s turn and completes her utterance with the word “a a gift” (line 02). The suggestion is immediately confirmed by the repetition of the noun “a gift” when Yua looks back at Babette and nods to show agreement, holding her hands apart as if holding a space in front of her chest (line 03; Fig.#17). Babette reciprocates with a couple of continuers, “aha mm”, and nods, showing recipiency and listenership, which also encourages Yua to finish her telling (line 04). Yua continues with a connective “and” but, once again, goes into another word search, looking the other way and remaining silent for another 1.2 seconds (line 05). When resuming her turn, she asks Babette, “can I say” and completes her utterance by pointing to the picture card laid in front of her and doing a hand gesture: She turns her open palms, which had so far been apart vertically (Fig.#17) and holds her hands upward horizontally (Fig.#18), pulling them one above the other, palms downward (Fig.#19). The gesture enacts stacking items. The hybrid utterance of talk and gesture (Goodwin, 2018; see also “composite utterance” in Enfield, 2013) helps Babette to interpret Yua’s action as a display of “wrapping”.

With a slower articulation in her response to Yua’s hybrid utterance, Babette offers an interpretation of Yua’s action, “<you wa- you want the gift wra:pped>. In her response, Babette articulates every word clearly, and also emphasizes the word “wrapped” (Sw: inslagen) by putting more stress on the first syllable and also stretching the vowel in the word “insla:gen” (line 06). Yua maintains a mutual gaze with Babette but produces no response (line 07). It is obvious the word “inslagen” is not part of Yua’s repertoire as she does not show any sign of following Babette’s response to her inquiry. With no sign of receipt, Babette describes a scenario for Yua and recycles the gesture used by Yua depicting wrapping. Babette parses her own turn in different installments and builds a mutual understanding step by step: “you’re in the shop”, Babette says (line 08), and receives a confirmation both verbally (“yeah yeah”) and also by nodding (line 09); she then continues (line 10): “’n you want them to do li:ke” and while articulating saying “’n you”, she pulls her hands apart, holding them in front of her chest, angled upward. When she says “them to do like this”, she moves her hands in a half-circle, palms down, placing one hand over the other. With her gesture matching the one made by Yua in the earlier turn (line 05), Babette enacts the action of wrapping (line 10). Babette thus visually depicts her understanding of the gesture made in the previous turn by Yua and also seeks confirmation from Yua. The confirmation seeking is apparent both in Babette’s turn production, in the prosodic feature of the utterance she makes with a rise-to-mid phrase-final intonation, and also in how it is understood as seeking confirmation by Yua in the following turn (in line 11, shown in Ex.#03b).

What is deemed noteworthy is that Babette’s first response to Yua’s inquiry in line 06 with only a linguistic formulation was not evidently intelligible for Yua (it was passed unanswered by a long pause and mutual gaze, shown in line 07). This is unsurprising, as Yua’s lexical repertoire in Swedish is manifestly limited (at least regarding the terminology related to the topic of the inquiry; see line 05). The mutual understanding is only achieved when Babette tries (lines 08–10) to respond to Yua’s question again, but this time, she recycles some constituent parts of Yua’s own action, here the gesture of “wrapping” to build a common reference in talk. In other words, the relevance of Babette’s actions as a response to Yua’s turn is only understood through the recognizable and highlighted features of Yua’s own action. So, the gestural matching used in Babette’s action (line 10) is a referential practice, not only making Babette’s action intelligible, but tying Babette’s action back to the original gesture in the previous turn (see tying technique in Sacks, 1995: 716). Moreover, the matching gesture contributes to Yua’s recognition of Babette’s action as a candidate understanding of the inquiry she had just made.

Example (3b)

Participants: Yua (YUA, visitor), Babette (BAB, volunteer)

In the continuation, Yua confirms Babette’s candidate understanding of her inquiry by saying “yeah” and producing a few nods (line 11). At the same time, Babette enacts the gesture of wrapping once more by pulling her hands apart and moving them above each other. The repetition of the gesture is also responded to by simultaneous nods. At this time, after establishing their mutual understanding, Babette repeats the formulation that she offered first in line 06. The formulation is an alternative way of telling the hybrid utterance that Yua was communicating through the combination of talk and the gesture of “wrapping”. So, Babette’s response corresponds to the meaning-content of Yua’s inquiry in the preceding turn. This time, however, Babette turns the event into a pedagogical one, responding not only to the inquiry through talk but also through writing. Using her pen, she first points to, and taps on, the picture of the wrapped packages on the picture card on the table (Fig.#23). She then continues by saying, “you want (something like this), so you say then” (line 12), and her incomplete syntax gets completed through writing and reading aloud what she writes: “I want to have the gift wrapped” (line 14). She reads every word she writes (line 14), and when she gets to the word “inslagen” (En: wrapped), she reads it once more (line 15). Babette then repeats the whole utterance “I want to have the gift <wrapped>”, articulating clearly and slowly the main item of inquiry, i.e., wrapped (line 15). This is confirmed by Yua’s acknowledgement token of “yeah” and nodding (line 16). Then, Babette explains further that the utterance will be understood as a request for wrapping, as she says “then (0.5) they take a paper (.) so they make a package”. Her explanation of what the formulation entails in practice is also concomitant with the repetition of the matching gesture of wrapping (line 18; Figs.#24 and 25). The further contribution (line 18) with the repetition of the gesture warrants the intelligibility of her explanation and understanding of the new term “inslagen” (En: wrapped) for Yua. This is evident in Yua’s uptake when she repeats the instructed utterance in her own turn (lines 19–22), which is completed by Babette (line 23): “then (0.3) I (2.0) I just say I want to have the gift wrapped so then (it gets) — wrapped”. This mutual understanding gets confirmed and acknowledged by Babette and Yua, in consecutive turns at the end of the sequence (lines 24–26).

The example shows how an L1 speaker can recognize and understand an action made by an L2 speaker by virtue of the salience of an iconic gesture as a highlighted and preservable feature of a produced action. The relevance of the gesture is also made clear in the response to the first action through recycling the gesture in the subsequent confirmation-seeking turn (lines 8–10). The recycling gesture consolidates the same referent into the new action, interrelating not only the subsequent turns as pair ones (for the second being conditionally relevant to the first see “adjacency pairs” in Schegloff, 2007), but also interrelating the meaning-content of the first turn (here Yua’s hybrid utterance, in line 05) to its response, here providing a particular way of saying and writing in Swedish as an alternative way of formulating Yua’s hybrid utterance in plain words: “jag vill ha presenten inslagen” (line 14). The response with the help of the matching gesture makes recognizable the associated lexical item as a highlighted element of the turn: “inslagen”. Through depicting wrapping by gesture, pronouncing the word, and writing it (i.e., providing the term in writing as a recordable for Yua; see Komter, 2006 for “recordable”), Babette makes the term “inslagen”, and the whole utterance containing that term, particularly highlighted as a pedagogical focus or “learnable”.

4.4 Example 4

The following example is from video recordings in the same language café as Ex.#03 where Rana (RAN), an L2 speaker, is talking to Anna (ANN) and Beatrice (BEA), who are Swedish-speaking volunteers. They have been talking about life in Sweden. In this particular excerpt, Rana compares the climate in Sweden with her own home country, Sudan. Unlike previous examples, the orientation toward the lexical item is not the main business in hand in communication: Orientation evolves contingently in conversation as a post-expansion of the sequence. The sequence is presented in two parts, Ex.#04a and Ex.#04b:

Example (4a)

Participants: Rana (RAN, visitor), Beatrice (BEA, volunteer), Anna (ANN, volunteer)

The sequence (Ex.#04a) starts at a point in the conversation where participants are talking about the cold weather in Sweden. Rana contributes to the conversation by saying that Sudan is not cold (line 01). She also waves her pen back and forth, perhaps showing negation in gesture (Fig.#26). Despite this, her statement is first misheard by Anna, who initiates a repair, “is it cold now?” (line 03), which is corrected by Rana, who says “not cold”, while still waving her pen in the air (line 04). Rana’s response is produced in an overlap with Anna’s question (line 03). In fact, she means to the contrary when she stops moving her pen back and forth, and continues saying, “very very ve:ry : (0.9) e: hot”, with emphatic articulation of the quantifier “very” three times in a row (line 05). This is receipted positively by Anna, who also nods in response (line 06). Seemingly not actively listening until now, Beatrice asks Anna what they are talking about (this part is not shown in the transcript). After a brief exchange between Anna and Beatrice, Rana continues talking and compares Sudan’s climate with that of Sweden, and says, “’n here in sweden is very cold”, again, with the emphasis on the word “very” (line 14). At the same time, when Anna replies with a couple of the receipt tokens of “yeah” (line 15), Rana holds her arms together in a V-shape and keeps them in the same position while receiving a reply from Beatrice who says “yeah’yeah” (line 16). While keeping her hands still in a V-position, Rana says “two::”, and immediately moves her hands apart in two opposite directions (line 17; Fig.#27), and holds them in that position across a few turns (ending in line 19).

The lexical item “two::” with elongated vowel together with the gesture of pulling the hands into two different directions are treated as exhibiting “opposites” as evident in Anna’s response (line 18). The form, and, more importantly, the sequential positioning of the gesture right after the talk about “hot Sudan” and “cold Sweden”, as well as Rana’s assessment of Sweden’s whether being “very cold”, right before the use of the gesture (line 14), all contribute to the understanding of the gesture showing the opposite climates (it is also worth noticing the prosodic features in the production of quantifiers, which are uttered more accentuated and add to the contextualization cues, making it clear that the gesture represents two opposites). Anna’s alignment (line 18) first comes with an acknowledgement token of “yeah” but is further elaborated on through repeating the gesture initially made by Rana. Anna opens her arms, holding her hands apart over her shoulders (Fig.#28), and offers the linguistic form, “opposites”, for Rana’s gesture: “yeah;. yeah right they’re opp-oppos-opposites” (line 18). The linguistic form “opposites” is a candidate word for the meaning that lies in Rana’s hand gesture, i.e., moving hands in opposite directions. Even Beatrice joins in and says, “there’s a big difference”, and even if half of her body is off-camera, she, too, is clearly using a matching gesture, holding her hands apart to show the opposites (line 19). At this moment, Rana repeats the gesture, although, now, crossing her hands in front of her chest (Fig.#29) as she says “here” (line 20). The gestural matchings, which are exhibited in moving her hands in opposite directions, are in consecutive turns (line 18–20) produced by all parties seemingly to show alignment with each other, and to affirm mutual understanding. The repeated gestural matchings can also be considered as a practice of singling out part of the hybrid utterance in Rana’s turn that is manifestly highlighted as the preservable feature in the conversation (see Sacks, 1995: 773). This preservable feature is evidently used for further interactional work to provide a linguistic gloss as the meaning-content of the gesture depicting “opposite”. In the continuation of the talk (Ex.#04b), the sequence gets expanded and the gesture’s affiliate verbal expression is picked up by Rana and made the topic of the rest of the talk (see conversation for learning, Kasper & Kim, 2015):

Example (4b)

Participants: Rana (RAN; visitor), Beatrice (BEA; volunteer), Anna (ANN; volunteer)

After Anna offers the alternative linguistic form, “opposites”, to express Rana’s initial gesture (line 18), Rana orients to the suggested vocabulary by Anna and repeats the word “opposites?”, try-marking the word (Sacks & Schegloff, 1979) with a rise of intonation signalling she is asking for further explanation on that item (line 21). Beatrice provides an explanation through an example of juxtaposing “summer” and “winter” (line 22). However, not only the meaning of the word but also the lexical form is of interest to Rana. She then repeats the word “opposites?” (line 25) a second time. The articulation of the linguistic form is confirmed by Anna saying “yeah” while nodding (line 26). Rana’s third repetition of the word (line 27), with a slight rise of intonation, gets a more elaborated response by Anna, who builds on Beatrice’s summer-winter oppositions. Now, she topicalizes the word “opposites” in her utterance, holds up her hands closely, facing each other, moves them first to her own left above her chest (Fig.#30) and says “winter (.) snow cold”, and moves her hands to her own right, above her shoulder (Fig.#31), and says, “summer hot (.) one can swim in the water” (line 28). Such a pedagogical lexical explanation is not only done verbally but also gesturally to explain the word “opposites”. Anna’s contribution joints the structure of her contingent teaching to the preceding turns, in terms of both content, regarding the theme of the conversation (weather), and of form, regarding the gesture initially used by Rana. Anna’s gesture this time around, though apparently different from Rana’s, does the same thing in terms of splitting the space in front of the speaker in the two positions of right and left so as to show the polar positions.

At the end, the use of matching gestures and verbal explanations of the term “opposites” is now explicitly demonstrated as understood by Rana in her response to Anna’s explanation, as she says, “aha” and nods (line 30). Then, with a further account by Anna about what the word opposite could refer to, e.g., cold/warm, (line 31 and 33), and reiterating that cold and warm are opposites (line 35), Rana once again recycles the gesture of crossing her arms in front of her chest. She then repeats what contingently evolved to be the practice of saying and showing opposites (line 36, Fig.#32).

5. Concluding discussion

This study has shown how iconic gestures in hybrid utterances and their matchings are crucial parts of grounding work of singling out epistemic gaps in L2 speakers’ linguistic repertoire in L1-L2 interactions. The use of gestural matchings in working out the sought-for lexical item in L2 talk is not just through representing the form or the meaning-content of an action or the linguistic object; it is rather through making that form and content as a preservable and reportable feature of the conversation, invoked in subsequent turns for an instructional activity.

The trajectory of the sequences (word search/explanation sequences) in this study appears to include the following: (1) the initiation of the inquiry by L2 speakers using hybrid utterances of talk and gesture (Goodwin, 2018; see also “composite utterance” in Enfield, 2013) is followed by (2) the responsive turn made by the L1 speaker recycling the gesture in L2 speaker’s turn, and also offering the affiliate verbal expression to the gesture. Working out the sought lexical item pertains to the procedure of confirmation seeking, including the repetition of the gesture as demonstrably a tying technique for making the relevant next action (see tying technique in Sacks, 1995: 716). (3) The third turn is often the confirmation of the L1 speaker’s understanding, which may lead to (4) the post-expansion of the sequence with more repetitions of the gesture along with the affiliate verbal expression (with the reiteration of the details of the production of the expression often in verbal form, but also in written form). This makes a contextual premise for highlighting the lexical item as both the focus of communication as well as instruction.

In general, gestures and their repetitions in consecutive turns (i.e., matching gestures or return gestures), as shown in previous studies (e.g., de Fornel, 1992; Koschmann & LeBaron, 2002; Lerner, 2002; Majlesi, 2015, among others), are used to contribute to the specific display of understanding, to building cohesion across turns, or to the construction of conjoint actions (see also Piirainen-Marsh, Lilja & Eskildsen, 2022/this issue). Similarly, we have seen in the presented data that a gesture is recycled as a practice of displaying understanding through the visible and recognizable embodied reference to the previous turn of talk. This corroborates the findings of previous studies (e.g., Lilja & Piirainen-Marsh, 2019; cf. Greer, 2022/this issue) that iconic gestures contribute to forming and maintaining intersubjectivity, particularly in L1-L2 interaction. Moreover, matching gestures highlight the preservable features indexed in the original turn, evoking not only the interrelatedness of the subsequent turns, but also drawing attention to the part of the previous turn as a joint focus of attention and instruction. Therefore, with the help of matching gestures, L1 speakers (a) secure the attention of the participants to the joint area of action, and (b) make visible the meaning-content of the referent in the preceding L2 speaker’s turn to tie to L2 speaker’s focal object of inquiry. Moreover, they (c) make a connection between the gesture and the affiliate verbal expression which they offer as a candidate form in L2, and thus (d) recontextualize the gesture as a resource for further instruction. Gestural matching, in the context of this study, is thus part of the procedure in which the focus of the inquiry contingently evolves into the focus of teaching and learning a particular linguistic object. Gestural matching is, in other words, part of the account framing understanding a new practice, which is a locally achieved order in the current interactional business, emerged and oriented to as a learnable.


I am grateful to the volunteers, the visitors and the organizers of language cafés who gave us permission to record and study their activities as part of the project, “The language café as a social venue and a space for language training” (financed by The Swedish Research Council, grant nr. 2017–033628). I dearly appreciate the comments I received on the earlier version of this paper by the special issue editors, and also the two anonymous reviewers that helped me to improve the paper. I am also indebted to Alan Zemel and Timothy Koschmann for our continuous discussions on the relevance of the concepts I used in this study for the analysis of instructional activities.


