This article focuses on the coordination of speaking and drinking. Because physiological constraints largely preclude speaking and drinking concurrently, participants must balance their engagement in one with the other. I focus on environments in which a currently drinking participant is selected to speak next, since this requires the participant to manage the conflict between drinking now and speaking next. Participants are shown either upholding the progressive development of drinking and talk-in-interaction in parallel, or adjusting the trajectory of drinking to engage in talk-in-interaction. These orientations to the practical incompatibility between drinking and speaking reveal participants’ sensitivities to action modality.


Keywords: multimodality, multiactivity, drinking, conversation analysis

1      Introduction

Doing two things at once is a pervasive feature of social interaction. The availability of more than one activity demands of participants some method for organizing one with respect to another. Ongoing work in conversation analysis on multiactivity has addressed the ways in which participants recognizably engage in multiple concurrently relevant activities through the skillful usage of verbal, vocal, bodily, and material resources (Goodwin, 1984; Mondada, 2011, 2014a; Haddington, Keisanen, Mondada, & Nevile, 2014). This article contributes to research on multiactivity by taking up the matter of how participants accountably engage in two activities that are largely mutually exclusive: speaking and drinking. It offers an empirical analysis of moments when both activities are relevant, specifically focusing on environments in which a currently drinking participant is allocated a speaking turn. The analysis shows some of the technical work involved in coordinating these activities. It furthermore suggests that participants treat the modality of their interactional contributions—language, prosody, gesticulation, gaze, posture, head movement, instrumental manipulation, etc.—as an important feature of action formation in multiactivity settings.


2      Multiactivity and drinking

Multiactivity refers to “the social, interactional and temporal features of situations and conduct in which people organise multiple activities together, concurrently or serially” (Haddington et al., 2014, p. 5). Multiactivity settings are characterized by the contemporaneous relevance of multiple involvements (Goffman, 1963; Toerien & Kitzinger, 2007; Raymond & Lerner, 2014), participation frameworks (Goodwin, 2000; Mondada, 2013), or courses of action, which variably intersect, conflict, or run in parallel from one moment to the next (Mondada, 2014a). Participants organize multiactivity settings by determining how, when, and where to mobilize a given set of resources so as to recognizably advance with a given set of activities. The methods with which participants selectively advance one activity over another gives rise to various hierarchical, temporal, and sequential relationships between activities such as suspension, resumption, abandonment, postponement, and synchronization (Mondada, 2014a; Raymond & Lerner, 2014). The allocation of multimodal resources to particular 2 activities can be interpersonal, where multiple participants collaboratively orchestrate their involvement in multiple joint projects. This allocation may also be intrapersonal, as is done by single participants distributing resources among multiple activities (Deppermann, 2014). This article articulates the intrapersonal coordination involved in selectively advancing the joint activity of talk-in-interaction along with the individual activity of drinking.

Coordinating multiple activities requires a practical grasp of the organization of each individual activity and how they may be integrated. The organization of speaking in conversation has been documented by conversation analysts in over half a century of work on turn-taking (Sacks, Schegloff, & Jefferson, 1974), turn construction (Schegloff, 1996), and turn design (Drew, 2013). The organization of drinking follows the organization of many other manual actions in exhibiting a grossly tripartite structure: a preparation phase, focal action phase, and return phase (Lerner & Raymond, 2008; see also Kendon, 1980; Sacks & Schegloff, 2002). The preparation phase for a typical drinking action is characterized by taking the vessel, conveying it to the mouth, and situating or ‘docking’ it there; the focal action phase is marked by tilting the vessel upward and pouring liquid into the mouth; and the return phase is constituted by reversing the tilt of the vessel, decoupling it from the mouth, and bringing it and the body to some resting position. A formal organization of this sort furnishes for participants a visible index of the trajectory of bodily action against which other activities (such as speaking) may be coordinated and furnishes for analysts a positionally grounded method for describing manual action (Lerner & Raymond, 2008).

Several aspects of drinking recommend it for an analysis of multiactivity. Most significantly, drinking inhibits most forms of speaking. When liquid occupies the mouth, the oral articulators are practically inoperable, and when swallowing, exhalation is impossible. Participants must therefore out of physiological necessity coordinate moments of speaking with moments of drinking. Despite this seeming mismatch, however, the two commonly go together. Indeed, even when ‘going for drinks’ (coffee, beer, tea, etc.), the very drinks themselves are not ordinarily the focus of the encounter. Rather, drinking tends to operate as an alibi for interaction (Laurier, 2008) and may even scaffold particular forms of sociability (Frake, 1964; Manning, 2012). Additionally drinking is relatively unrestricted in its placement. With the exception of things like ritualized drinking (Frake, 1963) and toasts (Manning, 2012), participants appear to drink whenever they please. Its placement is not motivated by the actions of others in any obvious way, so we may surmise that participants choose when and how to drink. [i]

Apart from studies focusing on food-related talk (e.g., Wiggins, 2002; Mondada, 2009; Pomerantz & Mandelbaum, 2016), relatively little attention has been paid to the organization of talking, eating, and drinking. Goffman (1963) considered eating and drinking to be forms of auto-involvements—types of momentary withdrawal from interaction that, if over-indulged, carry the risk of expressing disloyalty to the social occasion within which they occur. Though not generally regarded as types of gesture, eating and drinking nevertheless have the capacity to regulate social interaction (Kendon, 2004, p. 9). For example, because they occur at the face, they are potentially of the class of actions (along with face-touching) that are systematically disattended by coparticipants (Goodwin, 1986). Previous work on the coordination of talk with food and drink has shown how unaddressed participants modulate the act of distributing food according to the structural organization of a storytelling (Goodwin, 1984), how the completion of a drink can be coordinated with the pre-closing section of a coffee break (Laurier, 2008), and how a drink can be used in progressive disengagement from speakership (Walker, 2012). I build upon these studies by restricting myself to an analysis of drinking in a particular sequential environment in conversation.

The analysis focuses on environments in which drinking and speakership are simultaneously relevant. All cases involve a participant who is (a) in the preparation or focal action phase of drinking, and (b) provided the opportunity to speak next (see Sacks et al., 1974; Lerner, 2003; Hayashi, 2013). I further restrict myself to cases in which the participant’s response occurs on time and without delay.[ii] For such participants, there is a conflict between drinking now and speaking next. They are obligated to produce a next action and so must choose between continuing drinking or discontinuing drinking. In what follows, I focus on both of these choices. First, I show cases where drinking continues, and a non-verbal response is produced. In these cases, both activities occur in parallel, each proceeding without perturbation. Then, I show cases where drinking is discontinued such that a response can be produced. In these cases, the trajectory of drinking is adjusted for the production of a next action. In the final section of the analysis, I argue that the decision to either continue or discontinue drinking in this environment displays participants’ sensitivities to the modalities or resources used to build actions.

The data for this article are ten video-recordings (4.8 hours) of naturally occurring interactions in US and UK English between friends, intimates, classmates, and coworkers, all of whom provided informed consent. These recordings capture scenes of hanging out, mealtimes, board games, and preparing documents. All observable drinking actions were identified for an overall collection of 333 cases. From this collection, I identified instances in which a drinking participant was allocated a next turn. The resulting 41 cases serve as the basis for this paper, which proceeds according to conversation analytic methods (e.g. Sidnell & Stivers, 2013; Hoey & Kendrick, in press). Transcription conventions follow Jefferson (2004) for verbal/vocal conduct and Mondada (2014b) for visible conduct.


3      Analysis

3.1       Parallel progression of drinking and speaking 

Though drinking and speaking present somewhat conflicting demands, participants may nevertheless proceed with them in parallel. One way that this is done is through vocalization (see also Wiggins, 2002; Gonzales-Temer & Ogden, 2015). In Extracts 1-2, a currently-drinking participant is provided the opportunity to speak next, and in each case, that participant produces a vocal response while drinking. In Extract 1, Marie positively receipts an answer in third position with mm (line 4), and in Extract 2, Jennifer responds to a question with the negative answer token mm-mm (line 4).


Extract 1.

Extract 2.

While Marie and Jennifer could conceivably have produced verbal responses only after completing the focal action phase of drinking, that is not what happens. Instead, they produce sequentially appropriate vocal responses on time with minimal gap (Extract 1) or overlap (Extract 2). So, despite the constraint that drinking places on verbalization, participants who are allocated the next turn may still participate in a sequentially relevant way and in accordance with the normative temporal demands of talk-in-interaction.

Another way in which participants may accountably—that is, observably and recognizably (Garfinkel, 1967)—participate in drinking and conversation simultaneously is through the bodily resources, such as gaze, gesticulation, or, as shown next, an eyebrow flash. At the start of this exchange, Jamie asks Max about the number of lines on a painting they have been discussing. Over the course of his question, Jamie gazes from Max, briefly toward Will, then back to Max (figure 1a-c). The import of Jamie’s slight head movement is that he is asking on behalf of both himself and Will as a party. Jamie’s head movement works to ‘include’ Will as invested in the question and Max’s eventual answer.


Extract 3.

Figure 1. Eyebrow flash during drink-in-progress

281:Users:elliotthoey:Desktop:Screen Shot 2017-06-09 at 12.38.05 PM.jpg

Max’s answer comes in two turn-constructional units (TCU; Sacks et al., 1974). During the first (ish), Max turns to face Will, thereby treating Will as a recipient to his answer (figure 1c-d; Goodwin, 1981). As Max continues with a second TCU (it’s got…), Jamie also turns to gaze at Will (figure 1d-e). With both Max and Jamie now turned to him, Will is in a position to produce some reaction to Max’s answer (Stivers & Rossano, 2010). However, throughout this entire exchange, Will is drinking and thus cannot speak. Given the unavailability of speaking, Will gazes back and produces an eyebrow flash (figure 1e; cf. Goodwin, 1981; Levinson, 2015). With this, he treats his coparticipants’ gaze toward him as requiring some reaction and produces a non-verbal action that would be visible to both of them.

What should be clear is that drinking does not totally inhibit participation in interaction, nor does the management of two mutually exclusive activities necessitate the suspension of one for the advancement of the other. Rather, the orderly progression of both may be preserved. Participants upheld the orderly progression of drinking simply maintaining engagement in the focal action phase of drinking, and they upheld the advancement of talk-in-interaction through the allocation of non-verbal resources. This contrast with the extracts shown next, in which the trajectory of drinking is adjusted such that speaking can take place.


3.2       Progression with adjustment 

Rather than proceeding with both drinking and conversation in parallel, participants may make elementary adjustments (Lerner & Raymond, 2008; Raymond & Lerner, 2014) to the trajectory of drinking so as to support the progression of the interaction. Such adjustments may include acceleration, retardation, or, as shown below, retraction of a drinking action. Laura, Michelle, and Mom are talking about darts, which Michelle and Laura recently played. Their exchange starts with Laura explaining to her Mom how points are calculated in the game. As her explanation proceeds, Michelle launches her drinking action (figure 2a).


Extract 4.

281:Users:elliotthoey:Desktop:Screen Shot 2017-06-09 at 12.40.26 PM.jpg

Figure 2


During her explanation, Laura projects a numeric value with if ya hit it in the outside rim, ya get (lines 1-2). Before bringing her turn to completion, she cuts it off and turns to Michelle (figure 2a-b). With this complex of behaviors, she allows for conditional entry onto the turn space—a place where Michelle may provide some talk that advances Laura’s turn to completion (Lerner, 1996). Without missing a beat, Michelle reverses the tilt of her glass (figure 2b-c). That is, at the transition between the preparation and focal action phase of drinking, she decouples the glass from her mouth and suspends it in ‘on deck’ position. This frees her oral articulators and allows her to supply the number two (lines 3-4), which completes the incidental word search sequence (Goodwin & Goodwin, 1986; Schegloff, 2007). Having only partially suspended the drinking trajectory, Michelle then resumes that action as Laura resumes her explanation (lines 5-6).

Participants may also accelerate the trajectory of drinking. In Extract 5, Matt mentions that people on Twitter have been going mental in response to the event they have been discussing. Rowan moves to stabilize this introduction of new material by prompting Matt to expand (line 2). As Rowan issues his prompt, however, Matt moves to exit the turn space. Matt recompletes his turn (Sacks et al., 1974) with a temporal adverbial clause and turn-final so (see Raymond, 2004) while lifting his mug. It is here that we observe both Rowan and Matt orient to Matt’s drinking action.


Extract 5.

Rowan pursues expansion with an assertion about Matt (I bet youv-, line 4) but does so just as Matt lifts his mug. In reaction to Matt’s incipient drinking action, Rowan cuts off his turn, pauses, and gazes to Matt, which attracts Matt’s gaze (Goodwin, 1979). He then revises his turn as a polar question (Drew, Walker, & Ogden, 2013). Rowan’s turn cut-off, pause, and turn revision allow time to elapse, and may be seen as ‘providing Matt time to finish drinking’. This is supported by Rowan’s gaze behavior: In the silence after his question (line 5), Rowan gazes at something in the distance, treating Matt’s response as not urgently needed or not needed now. Matt, for his part as selected next speaker, is seen working to end drinking and begin responding. Matt accelerates the focal action phase of drinking by lowering his mug, forecasting the beginning of a response. Both participants then treat a response as imminent: Rowan returns his gaze to Matt, who is nodding while swallowing his drink, which premonitors his positive verbal response in line 6.

Another instance of acceleration appears in Extract 6. Here, however, the participant adjusts her drinking action not to produce a turn-at-talk but to remove her glasses from her face. Prior to this exchange, Betty had put on Teresa’s glasses to try them on. The transcript starts with Jennifer prompting Betty to expand on something she had said, and Betty resisting that expansion (lines 1-5). Betty does not take the opportunity to say more but instead visibly moves to end the sequence by going for her drink (figure 4a). As she proceeds to drink, their talk lapses into silence (see Hoey, 2015).


Extract 6.

Figure 4. Drinking accelerated to support removing glasses

281:Users:elliotthoey:Desktop:Screen Shot 2017-06-09 at 12.42.31 PM.jpg


Jennifer ends the lapse by requesting the glasses that Betty is wearing (line 8) and extending her hand out to receive them (figure 4b). In response, Betty accelerates the focal action phase of drinking. She lunges forward slightly and raises her left arm (figure 4c). With this conspicuous movement, Betty displays that she is on her way to complying with Jennifer’s request. And indeed, Jennifer treats her movement as such by retracting her hand from a receptive formation (figure 4c-d). Drinking participants may thus adjust their drinking actions to free up bodily resources for the production of a next action.

In these extracts, the progression of talk-in-interaction was preserved via the visible retraction or acceleration of the participants’ drinking actions. These elementary adjustments allowed participants to produce on-time next actions and thereby accountably participate in interaction. Compared to Extracts 1-3, in which participants treated drinking as something that could be done in simultaneity with the interaction, in Extracts 4-6 they treated drinking as an impediment to recognizable participation. In the next section, I expand upon these two orientations to the incompatibilities between drinking and speaking.


3.3       Modality sensitivity 

Actions may be constructed through (a combination of) different resources or modalities. To give a simple example, greetings can be done by waving (gestural/manual modality) or by saying hello (verbal/vocal modality). Though both of these are greetings, they are not equivalent actions in terms of their affordances. One difference is that waving can be seen at a distance, whereas speaking typically has a more limited perceptual range. Participants might thus wave when far away from one another and say hello when in closer proximity. In choosing one or the other method of building up an action, participants display sensitivity to what each modality affords.

As the previous extracts show, the choice to either continue drinking or adjust drinking after being selected to speak next affects the resources with which a response can be built. Continuing with drinking means that, if a next action is to be produced on time, participants must rely on nonverbal means for designing a response (vocalization, eyebrow flash, etc.). And conversely, if drinking is adjusted, then a greater range of actions may be produced using the resources that were inaccessible during drinking (speaking, head nods, etc.). Both of these choices appear in Extract 7, which involves the same participants as Extract 3. The three friends have been discussing Max’s role in a theater production. Prior to this transcript, Max motioned to end that topic and is seen at the beginning of this extract lifting his drink—a visible indication of no more forthcoming talk (figure 5a). Will, however, does not go along with Max’s motion to end the topic but asks him a question about the play (line 1).


Extract 7.

281:Users:elliotthoey:Desktop:Screen Shot 2017-06-09 at 12.43.31 PM.jpg

Figure 5.

Will’s question features a couple restarts, which work to elicit Max’s gaze (Goodwin, 1979; figure 5b). Near the end of Will’s question, Max suspends the lift of his glass, keeping it nearly perched on his lower lip. By holding his glass in ‘on deck’ position, Max is able to both display commitment to finishing his drinking action and also produce nodding movement. As Max concludes his nodding, he tilts his glass toward his lips, indicating incipient resumption of his drink (line 3). However, Will asks Max another question (line 3). In response to this second question, Max simultaneously lowers his glass to the tabletop and produces a spoken response (lines 4-5, figure 5c-d). With this extract, we observe two questions posed to a drinking participant and two distinct analyses of what resources are required to adequately respond. Simply put, Max responds to Will’s first question with a nod, for which he simply suspended his glass. But for Will’s second question, Max responds with a spoken turn, for which he lowered his glass to the table. And so in quick succession, Max’s bodily behaviors reveal his understandings of what actions should come next and how those actions may be built up.

We can re-assess the actions we have seen so far with respect to the modality used to construct these actions. In Extracts 1-3, a restricted set of resources was treated as adequate under the circumstances. The mm (Extract 1) suitably corresponds to a verbal change-of-state token like oh (Heritage, 1984); the mm-mm (Extract 2) to a negative particle like no; and the eyebrow flash (Extract 3) to a surprise token like really? (Wilkinson & Kitzinger, 2006). In contrast, in Extracts 4-6, the participants evidently required a fuller set of resources to compose a response, making drinking adjustments necessary to access those resources. The numeral two (Extract 4), the nodding and a counter all the time (Extract 5), and the removal of glasses (Extract 6) are not as easily done when drinking—that is, the actions they implement are best composed using a fuller set of multimodal resources.[iii]

Participants thus exhibit sensitivity to the modalities used to construct their actions. In selecting from among multimodal resources for constructing a response, participants show a practical understanding of what different modalities afford. This means that the calculus for next speakers is somewhat more complex if they are currently drinking. In addition to the tasks of actively monitoring the unfolding talk for places of possible completion (Sacks et al., 1974) and simultaneously formulating an adequate response (e.g. Levinson, 2016), participants who are engaged in drinking must take into account the optimal response modality.


4      Discussion

Drinking was shown to be an orderly affair, responsive to interactional contingencies such as speaker selection. In balancing current engagement in drinking with the social obligation to speak next, participants were shown either proceeding with both in parallel or adjusting the progression of drinking to engage in speaking. These two orientations to resolving the practical incompatibility between speaking and drinking reveal participants’ sensitivities to modality in action construction. If a relevant next action can be built using a restricted set of resources, then drinking may proceed as it ordinarily would. But if the next action is more adequately expressed through words or otherwise requires a fuller set of resources, then elementary adjustments of the drinking action are in order.

This account contributes to our understanding of multiactivity by (1) focusing specifically on drinking, which has not been the focus of previous work despite being a widespread accompaniment to conversation; (2) explicating part the interactional work that goes into rendering drinking a seen-but-unnoticed feature of everyday multiactivity settings; (3) analyzing a sequential environment in which two activities are specifically incompatible; (4) specifying the intrapersonal coordination of one joint activity (conversation) with one individual activity (drinking); and (5) implicating modality sensitivity as a part of action formation in multiactivity settings.

The notion that participants are sensitive to the affordances of different resources and modalities can enrich our understanding of action formation and sequential implicature. Modality sensitivity captures the basic idea that the different ways of forming the ‘same’ action are interactionally consequential. Because actions can be assembled in various ways, participants must take stock of the availability and utility of multimodal resources for building just this action at just this time. For example, a greeting constructed through the verbal/vocal modality shows the speaker’s orientation to matters such as hearability. The resources used to build a given action might also bear consequences for the range of possible next actions. For instance, we can speculate that a spoken greeting implicates a slightly different response as compared to a manual/gestural greeting. This suggestion parallels the argument that language-specific resources for implementing the ‘same’ action bear collateral effects on how an action is understood and what it implicates (Sidnell & Enfield, 2012). Similarly, it is conceivable that particular configurations of multimodal resources used for an action subsequently afford particular next actions.

This article shows the practical work undertaken to interweave the individual manual activity of drinking into the collaborative joint activity of conversation. By contextualizing the act of drinking in its socio-interactional particulars, it offers an empirical specification of a widespread “technique of the body” (Mauss, 1979). And by isolating an environment in which participants must work to make the speaking modality available, it illustrates participants’ situated understandings of the practical utility of language for social action (see Rossi, 2014; Stevanovic & Monzoni, 2016).



[i] While these features also arguably apply to eating, drinking remains distinct in its affordances for talk. Most foods can be held securely in the mouth when speaking and would not spill out as liquid would. Relatedly, eating usually takes much longer than drinking. Drinking can be ‘slipped in’ to interaction in a way that chewing cannot. And in terms of the amount that can be consumed, having another bite becomes physically unbearable at some point, whereas having another sip is almost always feasible (intoxication perhaps being the limiting case). This means that drinking is not in principle limited by amount that can be consumed.

[ii] Because I restrict myself to on-time responses, I do not address cases like the following: (1) the drinking participant produces something like mm through which they indicate that a verbal response will come after drinking is finished and (2) the drinking participant finishes drinking and then gives a delayed response.

[iii] It would have been possible for Michelle to have held up two fingers instead of verbalizing two in Extract 4. This would have allowed her to proceed with drinking and responding simultaneously. However, we reason that she chose to verbalize two since one hand was holding her glass and the other hand was tucked away.