Social Interaction. Video-Based Studies of Human Sociality.

2022 Vol. 5, Issue 2

ISBN: 2446-3620

DOI: 10.7146/si.v5i2.119332

Social Interaction

Video-Based Studies of Human Sociality

Gaze and the Organization of Participation in Collective Visual Conduct

Mardi Kidwell & Edward Reynolds

University of New Hampshire


In this article, we demonstrate how participants make use of others’ gazing actions to monitor and engage events in their surroundings. Specifically, we focus on a tension between one sort of gazing action that participants freely join in, “noticing”, and another, “watching”, that is subject to constraints related to participant identities toward a collective and their corresponding rights of membership that include toward what and with whom they may gaze. Employing the method of conversation analysis, we provide a fine-grained examination of differences between the two gazing actions that include movements of the head, body, eyes, and feet, and highlight how these differences provide a resource for differentially orienting to the environment and joining in visually based activities with others.

Keywords: gaze, action formation, participation frameworks, group membership

1. Introduction

In our communal social lives, we manage a delicate arrangement of with whom, at what, and for how long we may gaze at something. Consider that within North American culture, it is not only impolite to stare at people through the windows of their homes but is in fact a punishable offense under “peeping tom” and “right to privacy” laws. Further, workplace protocols in many western countries define certain sorts of gazing actions as sexual harassment, which can be grounds for firing. Even looking over a stranger’s shoulder as they read or look at a cell phone is a breach of gazing norms in public, yet stopping to look at the scene of a car accident that others are looking at is an acceptable (if disturbing) practice.

In this study, we seek to understand some of the complex norms surrounding gazing behavior in terms of how participants recognize what others are doing with their gazing actions, and how they discern whether or not they can join in looking at what someone else is looking at. We do this by investigating a tension between one sort of gazing action that participants freely join in, “noticing”, and another sort, “watching”, that is subject to restrictions involving participant identities toward a collective, rights of membership, and the sort of event that is the gaze target. These forms of gazing, one an abrupt shift of visual attention to check something that has newly caught someone’s attention (noticing), and the other a sustained focus of visual attention to something that has already drawn it (watching), are ones that entail distinct arrangements of the body and eyes that make recognizable to co-present others that they are undertaken for different purposes.

2. Research on gaze

The study of gaze in interaction has nearly exclusively been directed to gaze in conversational interaction, in particular to its regulatory functions. For example, Kendon (1967) found that speakers gaze at recipients at the end of a turn as a turn-yielding cue, while Rossano (2013) has demonstrated that speakers and recipients orient their gaze not by reference to turns, but by reference to sequences of action. Goodwin’s work has investigated the role of gaze in the construction of turns and speaker/recipient alignment, reporting that speakers shift their gaze over the course of producing a turn, scanning potential recipients (in multi-party interaction) for one that is returning their gaze and/or producing speech dysfluencies to elicit the gaze of a non-gazing recipient (Goodwin, 1979; 1980). Gaze, too, is a resource for pursuing a response (Rossano & Stivers, 2010), and the withdrawal of gaze is a primary means of withdrawing from interaction or avoiding interaction altogether (Goffman, 1963), as well as transitioning from one activity to another in talk-based activities (Robinson, 1998). While gaze is undeniably integral to face-to-face conversational interaction in these ways and others (for reviews, see Bavelas, Coates & Johnson, 2002: 570-572; Goodwin, 1980: 29-33; Rossano, 2013), others’ gazes are also an important resource for how we monitor, attend to, and engage our surroundings in situations ranging from focused interaction (including conversation) to situations of mere co-presence.

The phenomenon of gaze following is well-studied in the psychological sciences under the concept of “joint attention”; that is, when two or more individuals gaze toward a third entity. In evolutionary and developmental psychology, joint attention is considered to undergird humans’ capacity for a “theory of mind”; that is, an understanding of others’ goal-directed behavior (e.g., Baron-Cohen,1997; Tomasello, 1999). Gaze following has been shown to entail a certain degree of automaticity in adults (e.g., Friesen and Kingstone, 1998; Driver, et al., 1999; Langston and Bruce, 1999; Ricciardelli, et al., 2002). Early research suggested that in infants, gaze following begins to emerge at 2 months (Scaife & Bruner, 1975), while other research has suggested it emerges at 6 months (Butterworth & Cochran, 1980). It is generally considered a well-established behavior by 10-12 months (Corkum & Moore, 1998). Tomasello and others have identified a developmental trajectory that gaze following and other joint attention behaviors follow (e.g., pointing and showing objects) that begins at about 9 months (Carpenter, Nagell, Tomasello, Butterworth, & Moore, 1998).

As social interaction researchers, our interest in joint attention behaviors concerns their import for what people do in concert with others (Kidwell & Zimmerman, 2007). Interaction researchers have observed in several studies that a gaze shift that is directed to a point in the environment, particularly in focused interaction, regularly enjoins others to participate. Goodwin (1981) demonstrated that gazes that are “noticings” may be joined in by a co-participant in a way that “staring into space” may not be (98-101). Others have shown that, in a variety of contexts, participants can be motivated to move their gaze to an object or point of focus by a co-interactant’s gaze shift that serves (wittingly or not) to highlight its relevance for the other and/or transform the current participation framework to a new one: for example, a doctor’s gaze shift to a notepad (Psathas, 1990: 216-219); a child’s gaze shift to a caregiver (Kidwell, 2009), or a playground object (Goodwin, 2000: p.1500-1503; see also, Goodwin, 2003: 7-9); a speaker’s gaze shift to her gesturing hands (Gullberg & Holmqvist, 2006; Streeck, 1993: 290;); a passer-by’s gaze shift to a street performer (Smith, 2017; Carlin, 2014); or a would-be assistance recipient’s gaze shift in search of something (Drew & Kendrick, 2018). In the present study, we take this phenomenon as our focus. We make use of a collection of cases to show that participants orient to the gaze shifts of others as being undertaken for different purposes that make joining in relevant, and/or appropriate, in different ways.

3. Data and method

For our study, we identified 71 cases of joint visual conduct in a wide variety of videotaped, naturally-occurring settings: a preschool, a gym, a retirement home, as well as such public settings as malls, supermarkets, sidewalks, and cafes. During our initial collection of cases, it became clear that the gazing actions that participants were joining in, and the manner in which they joined in and sustained their participation, were quite different. These observations motivated us to focus our analysis on 1) characterizing the compositional differences between two initial gazing actions that regularly emerged in our data (i.e., “noticing” and “watching”) and 2) investigating the circumstances that seemed to motivate the different character of how participants’ joined in these gazing actions by others.

Our collection includes 28 cases of participants’ gaze following another’s noticing-occasioned gaze shift and 43 cases of watching together. We made use of approximately 200 hours of videotaped data that we (the researchers) had collected for prior research purposes to identify our cases (and for which we received University Human Subjects approval), in addition to cases that we located on YouTube representing various forms of public social life (e.g., street performances, public pranks). In this article, we present six cases from our larger data set to illustrate our findings, including three cases from YouTube. For the YouTube cases, copies of the videos via embedded links. We obtained the owner’s permission to use one of the YouTube videos for our research; in the other two cases, we understand our usage to fall within the parameters of the copyright doctrine of “fair use”.1 Where we do not have permission to print video stills of participants, we have used drawings to render their likenesses (Example 2) or, in the case of a YouTube video of a sensitive nature (Example 6), we have over-exposed the video stills.

The number of cases in our collection, particularly of noticing-occasioned gaze following, likely underestimates the prolific nature of the practices we examine here. In the case of noticing-occasioned gaze following, it can be difficult to discern if two people who shift their gaze to the same direction do so because something has drawn their attention at roughly the same moment (i.e., situations of independent simultaneous gaze shifting), or because one person is in fact following the other’s gaze. Thus, for cases of noticing-occasioned gaze following, we developed criteria to reduce the possibility that the cases we were collecting were really instances of simultaneous looking and not gaze following. In our collection of noticing-occasioned gaze following, we’ve limited our cases to ones in which, 1) Participant A’s gaze shift has already begun before Participant B’s (the “follower”) (.3 seconds is the minimum gap between A and B’s respective gaze shifts in our collection); 2) B has visual access to A’s gaze shift because A and B are facing each other, or B is to the side or behind A and looking in A’s direction and can see A’s head turn in the manner of a gaze shift (discussed below); 3) B’s gaze shift is to the same direction as A’s. In cases that follow these criteria, it is reasonably clear that B’s gaze shift is motivated by A’s gaze shift because 1) unlike A, who gaze shifts soon after the “occasioning event” (a sound, a movement), B’s gaze shift is delayed relative to this event, and 2) the most proximate event prior to B’s gaze shift is A’s gaze shift, and B is looking in A’s direction. Other indications that someone is following someone else’s gaze shift (rather than independently gaze shifting) can include a slightly slower pace to the gaze follow action and/or not stabilizing on a target because the follower—although following in the general direction of the noticer—is uncertain of what the attention-worthy event is as they scan candidate targets in the environment (i.e., their gaze shift is motivated by the other’s gaze shift and not by a specific event in the environment). In contrast to the complexities involved in identifying cases of noticing-occasioned gaze following, which happen relatively quickly, identifying cases in which a participant joined with others in watching was much easier. In these cases, the “joiners” have visual access to the embodied watching formations of other participants, which may be sustained over a period of time, before they become watchers themselves.

The approach we use in this study is that of conversation analysis (CA). Even though our focus is on embodied practices that, in many instances, do not involve talk, and many of the situations we examine fall within the realm of unfocused interaction, CA provides tools for understanding the systematicity of these practices. This is particularly so with respect to understanding how these practices are “composed” or “designed” that, in conjunction with their positioning in a sequence of action, recognizably constitute them as different sorts of actions undertaken by actors for different purposes (e.g., Pomerantz, 1984; Schegloff, 1984) with different opportunities for co-participation. We used Elan software to aid in capturing the timing of participants’ gazing actions and other relevant conduct as accurately as possible. We support our analyses with transcripts influenced by the multimodal conventions developed by Mondada (see e.g., Mondada, 2019) in conjunction with video stills. The reader will notice different levels of granularity in the transcripts. Readers are also able to view the actual videotapes. In cases where participants are talking, we use Jeffersonian transcription conventions (Jefferson, 2004).

4. Noticing gaze shifts as an agnostic and emergent participation framework

In this section, we demonstrate how the act of someone shifting their gaze can motivate someone else to follow with their own gaze.

As reported in prior research (Kidwell, 2009), a noticing gaze shift is recognizable as such via the gazer’s quick turn of the head and stabilization on a target, while the torso and lower body (hips and legs) remain in their original facing position. The speed of the head turn is discernible from its pace relative to other embodied activities the noticer is engaged in (e.g., eating, walking, being a recipient, etc.). These movements—the quick head turn and stabilization on a target—create a torqued body posture that conveys instability and projects that the noticing is a temporary arrangement (Schegloff, 1998). This temporary arrangement may be underscored by the mid-course suspension of other activities (e.g., eating, etc.).

As the following cases demonstrate, the noticing gaze shift alerts co-present others to the possibility of something “attention worthy” happening in the environment; as an epistemic matter, the action locates the happening as a newly discovered aspect of the scene for the “noticer”, one that can motivate others to also look.

4.1 Example 1: “What’s he looking at?”

In Example 1 two children are sitting side by side at a table eating lunch in a preschool classroom (i.e., they are not in a ratified state of interaction). There are also several other children and some teachers sitting at tables and eating (Figure 1; note: the children and teachers are dressed in Halloween costumes). The segment of interest begins when the teacher (T) sitting opposite proposes that they sing a song (line 1; Figure 1), and then (independently of this) a child across the room (C3), off-camera, yells out (line 3). When C3 yells out, Child 1 (the child in back) turns and looks across the room in the direction of the yelling child (line 3; Figure 2). A moment later, Child 2 (the child in front) sitting next to him turns and looks, too, at the same time that the teacher begins singing (line 3; Figure 3).

Example 1. “What’s he looking at?” (Gunma)

  Open in a separate window

The relative pacing of the children’s gazing actions, and the duration of their gaze shifts and other actions, can be seen in the transcript. The gaze shift made by Child 1 is a noticing. His action is composed with an abrupt (.7 s) head turn (line 3; Figure 2), and he alights and holds on a target 180 degrees from his starting point (Figure 3; the view of his head is partially obscured by the child in front). This puts his torso and head in a torqued position relative to his lower body. As a sequential matter with respect to the activity that he is already involved in, eating and being a recipient to the adult, the movement is one that interrupts this activity: he shifts his gaze from the teacher, with fork still in hand. This mid-suspension of an in-progress activity accentuates the interruptive nature of the gaze shift, making it, as an epistemic matter, evident (to anyone who sees the child's movement) that something has suddenly, and newly, drawn his attention. As he fixes on the target and begins to align his lower body to it at 180 degrees, Child 2 turns and also looks (line 3; Figures 3 and 4).

As Example 1 demonstrates, the first child’s gaze shift, composed as it is as a noticing, locates the noise (the yelling child) as something ‘attention worthy’ taking place in the environment and draws the other child into a gaze following action. This following action also has an epistemic aspect: Child 2 has followed the gaze of Child 1 to find out what he is looking at, and to see if it will be of interest or import for him, too. Their moment of co-looking, however, rather quickly dissolves (line 3; after 1.8s).2 Child 2 returns his gaze to the table, seeming to lose interest in what he was looking at, and directs it to the teacher sitting across from him who has begun singing again (line 5; Figure 5). Child 1 continues to gaze in the direction of the yelling child, his body still torqued; it is not quite a committed watching position that he takes up, such as we discuss below, but is certainly an act of sustained looking.

The sort of gaze-shift/gaze-follow action that we see by the participants in Example 1 is prolific in environments of co-present activity that contain any kind of auditory and/or visual stimuli: people coming and going, voices, objects being moved about, and so on. These are events that may attract notice by those on the scene, and once they do, the noticing itself can attract others to join in, even momentarily, as followers seek to discern the relevance of what someone else has noticed for their own lines of action. In this sense, the “rights” to look, at least initially, apply to everyone, but whether or not participants’ sustain their attention toward the target depends on what has been located and, as we discuss next, the right to sustained looking. In the next case, we see that what one participant first notices is something that is relevant to and requires the sustained attention of others who are present on the scene.

4.2 Example 2: “For us”

In the next example, four adults are sitting around a table drinking coffee and chatting. At the moment of interest, Tom (back left) is talking and the others at the table are acting as recipients (line 1; Figure 6). Hank (back right) and Barbara (front right) have been gazing at Tom while Dan (front left) has been gazing in both Tom and Hank’s direction intermittently (in other words, he doesn’t take up a committed recipient posture). As Tom talks, Dan shifts his gaze about 45 degrees from Hank to a point just above and behind Barbara (off camera; line 1; Figure 7), and the others at the table begin to shift their gaze: first Hank, then Barbara, and lastly Tom (who is still talking) until they are all looking in the same direction (lines 2-5; Figure 8). What they find is that someone, Pat, has approached their group in order to speak to them (line 5). As a group they torque their heads and maintain postural instability (especially Barbara; Figure 8) as they prepare to interact (Schegloff, 1998). Pat has brought them a drawing to look at and they subsequently take turns admiring it (not shown in video). In other words, what one participant, Dan, has located with his gaze shift turns out to also be of import for the others: a newcomer to the table that calls for their collective participation as co-interactants in a new interaction.

Example 2. “For us” (Coffee Chat)

  Open in a separate window

In sum, we find that a noticing gaze shift is an action that is recognizable to co-present others via such features as the noticer’s quick head turn, stabilization on a target, and torqued body position. This sort of gazing action can motivate others to follow with their own gaze to see what has prompted it. The result is a social configuration of co-looking that, depending on what the participants have located, may be quickly dissolved (as in Example 1) or result in a new course of activity for one or all of the participants (as in Example 2). The participation boundaries of the first gaze shift as an occasion for others’ actions (whether discerned by the first noticer or not) are permeable in the sense that people may readily follow the noticing gaze shifts of others, but stop short of sustained looking or other action unless they locate an event that is relevant for them. This is not only a phenomenon that exists among acquainted others as in Examples 1 and 2, but as a host of YouTube videos show, it is also a phenomenon of public life where we find many examples of strangers using others’ gaze shifts as opportunities to monitor, attend to, and engage their surroundings. We present one example from this sort of setting that is especially clear in the way it demonstrates how we can use others’ gazing actions and associated conduct to alert us to potential threats in our environments.

4.3 Example 3: “What’s going on here?

Example 3 is from a YouTube video of an individual engaging in a “hand touching” prank on a busy escalator, a genre of prank for which there are a number of videos from different countries on YouTube. With this prank, someone riding on one side of the escalator (up or down) brushes the hand of someone who is riding on the opposite side (up or down) and holding onto the moving handrail; an associate records the interactions from some distance and the results are posted on YouTube. Example 3 comes from a Las Vegas, USA episode.

In Example 3, P, the prankster is riding up the escalator (on right) and brushes the hand of someone coming down, Man 1, who, startled, jerks his hand away and quickly turns to look at P (lines 1 and 2; Figures 9 and 10). P also brushes the hand of Man 2 (or at least moves to do so; it is a little difficult to tell from the video whether or not he actually make physical contact; Figure 10), the person standing behind Man 1, who also quickly withdraws his hand and turns to look at P (line 2; Figure 11). There is also a woman (W1), (not a target of the prank) standing behind M1 and M2, who sees the sudden gaze shift of M2 in front of her (it is noteworthy that she is looking at the face of M2 and not his hand, evidence that her action is first of all motivated by his gaze shift and not the hand prank itself; Figure 11). She follows M2’s gaze to P (line 3; Figure 12) and then slightly adjusts her purse as if to make it more secure (line 4; Figure 13). M1 can be heard in the video saying, “What the fuck is wrong with you ma:n,” but W1 begins to shift her gaze just before or just as he begins to say this, providing evidence that she is, at least initially, reacting to the M2’s gaze shift and not the utterance (line 3; Figure 12). However, the utterance, and the way it verbally locates P as a threat, arguably plays a role in her adjustment of her purse. Each of the participants appears to be strangers to one another (based on such clues as who is talking to whom and the side-by-side touching of elbows, it can be inferred that M1 is with the woman in front of him; M2 is alone; and W1 is with the man next to her). The case demonstrates the alertness that strangers can have to one another’s noticing gaze shifts, and how they can use them to monitor for and respond to potential threats in the public realm.

Example 3. “What’s going on here?” (Handtouch Prank)

  Open in a separate window

With Examples 1-3, we have demonstrated how participants orient to their shared environment in private and public realms as one of shared relevance via others’ visual noticings and use such actions by others to guide their own. Others’ noticings, defined by a set of recurrent features (a quick head turn, stabilization on a target, torqued body position, and often the mid-course suspension of an in-progress activity) that make recognizable to co-present others its purpose (i.e., to attend to a newly discovered happening in the environment), alert participants to something potentially attention-worthy happening in the environment and are joined in with few—if any—constraints. The next gazing action we turn to, watching, is by contrast one in which the entitlements, obligations, and constraints to joining in are pronounced.

5. Watching as a committed participation framework

Watching, like noticing, is an action engaged in by one person or many that alerts co-present others to something interesting, “watch-worthy”, happening in the environment. As an epistemic matter, it conveys not that something has been “newly discovered” as with a noticing, but “already discovered” and worthy of, or otherwise requiring, sustained attention. Watching, which begins when something visually or audibly attracts someone’s attention, may begin as a noticing or entered into more gradually, as when someone comes onto the scene of an interesting event. Watching can also be organized as a pre-planned event in which built spaces are pre-ordained for watching occasions: theaters, stadiums, concert halls, and the like. The in situ watching that we examine here is recognizable via the embodied configuration of a sustained and committed arrangement of the feet, torso and head, as well as the sustained direction of gaze toward the watched situation (Schegloff, 1998; Smith, 2017). This is in contrast with the quick movement and torqued body of the noticing gaze shift. Further, in watching formations, participants—audiences—aim to arrange their bodies side-to-side in semi-circular or L-shaped formations (Kendon, 2010). Watching as an action is thereby configured with a recurrently recognizable formation.3 In our analysis, we do not focus on the fact of whether it is the watching formation itself or the event being watched that draws others to join in—although the former can be the case; rather, our interest is in the joint production of watching and the differential conduct of the watchers in relation to one another and to the watched event.

In the following cases, we see that watching arrangements orient to a spectrum of designs of the watch-worthy event, from situations of watching together which are treated as being “designed for anyone”, to “only for some”, to “breaches of the local scene”. In entering and leaving watching formations, in addition to the embodied manner in which they compose their watching, participants orient to differing entitlements to watch. In so doing, they constitute in part, what the event is, as well as their relationships with others who are also watching.

5.1 Example 4: “Designed for anyone”

First, we discuss the “designed for anyone”4 situations in which audiences are joined in by passers-by. Example 5 comes from a public performance posted on YouTube by a clown, “Karcocha”, on a boardwalk in Barcelona. The clown, facing an audience on a stepped amphitheater, is trying to get the seated audience to clap along with a beat. In the background, to the right of the clown, are three men walking together and gazing toward the clown (M1 leads followed by M2 and M3; Figure 14, line 1). After they have moved past him, M1 pivots and turns his back to the fence to stop to watch, and M2 (now to the right in the lead in Figure 15) shifts his gaze to observe the new movement of his consociate (line 2). Shortly after, M3, recognizing that M1 and M2 are moving into a watching position, turns to face the clown and begins stepping backward toward the fence (line 2):

Example 4. “Designed for anyone” (Karcocha)

  Open in a separate window

In other words, M2 and M3, having understood the change in activity of their consociate (M1), also pivot to face the clown. Over the next several seconds, all three men take steps back toward the fence, with M2 and M3 coming to fully rest against it (Figure 17):

Figure 17. M1, M3, M2 (L to R) have reconfigured their stances into a watching formation. A couple on the right also watches.

Next, the men reposition themselves from passersby to watchers by halting walking, stepping out of the path of pedestrian (and bike) traffic, and settling themselves near other watchers (a couple similarly positioned against the fence). They then commit their feet, eyes, and shoulders in a stable watching orientation towards the spectacle. In this way, they treat the performance as one they are entitled to give sustained attention to.

The event the men are watching is, of course, one that is designed as a public performance; it can be watched by a collective of “anyone”. Such events stand out from the usual activities of people in public places (e.g., walking, entering and exiting buildings, sitting on a bench to eat or smoke): They are located where they can be maximally viewed by passersby (in a market square, at the entrance to a subway station), and typically involve artistic, humorous, musical elements and so on that attract and entertain. Performers actively work to recruit audiences via the use of these elements, and also, sometimes, by calling out to passersby and/or seeking to involve them in their performances (Carlin, 2014; Smith, 2017). Smith (2017) argues that such shifts in participation constitute a shift in membership category, that is, from passer-by to audience member, which acquires a set of transactional obligations: Performers in public places treat audiences as obligated to properly appreciate the performance, whether with applause or money (Carlin, 2014, Smith, 2017). Thus, an activity that starts out with passersby observing that others are watching something comes to involve a shift in membership category and a subsequent altering of their participation framework.

Watching is itself an activity that constitutes an event in such spaces as “happenings”. Passersby will treat an event such that the size of the crowd can draw them to stop and join the formation simply because of its size: The bigger the collection of watchers, the more interesting and worthy of attention the target is. However, there may be competing norms as to the watchability of some event or situation. That is, parties may act as if some event that is visually available to everyone may not in fact be watched by everyone. In the next example, we investigate how a person new to a strength training class in a private gym enacts his status as a class newcomer in the manner in which he watches the deadlifts being performed by others.

5.2 Example 5: “Only for some”

The next case comes from a collection of instances drawn from private, strength training classes in a gym, a context that is quite different from the very public, “for-anyone” cases of public performances just examined, or the “breaches of the local scene” we examine later. In these “only-for-some” cases, there are degrees of belonging to the collective associated with rights to watch someone else’s training. Thus, while these “only-for-some” events may be watched by class members on the scene, including members of a powerlifting team that trains within the class (who are expected to watch), class newcomers may exhibit uncertainty about how to behave, in particular at what and toward whom they should direct their gaze.

When training as a class in a closed-gym environment (that is, members of the public are not present), it is routine for members of the class to watch the squat, bench or deadlift of a classmate; for members of the team-within-the-class, watching teammates (but not necessarily classmates) perform these exercises is an expectation, one undertaken as a form of team solidarity that is part of what constitutes them as a team. This is in distinct contrast to watching someone in a commercial, open gym, where it is rare for one party to watch another, the norm being that one should not watch the lift of someone they do not know. This difference between the two settings’ (closed gym versus open gym) organization of rights-to-watch essentially pivots on the roles and relationships between the watcher and the watched: To watch another’s lift is to display a claim of having the proper relationship to do so, and not having the proper relationship while being an audience to a lift can result in sanctioning (Reynolds, 2017).

Clark (2017) describes the way in which women in a gym report finding “appreciative” stares at “good lifting” as acceptable while feeling uncomfortable at “sexual” stares. The camaraderie enacted in watching a lift makes a different set of categories relevant than that of “sexual” stares. Related to this, Reynolds (2017) describes a case in which a male watching a female lifter ceases watching without making an appreciative acknowledgement to the lifter and in doing so is oriented to as “creepy” by the lifter. In other words, because he did not take a moment to acknowledge the lifter’s lift, he did not act as if he had, and was not treated as having, the proper relationship with the lifter in order to watch the lift.

The next example highlights the way in which watching is clearly an affiliative exercise for some—a display of team membership in this case—while for others who are new to the setting, it is something more equivocal. Example 5 involves an interaction in a private gym in which someone new to the class, Matt, visibly conducts himself as a newcomer. Matt has been standing across the room from team members (Coach, Edward, and Kathleen) who have assembled themselves in close proximity to one another, their bodies oriented toward one another for interaction making them visible as a “with” (lines 1-3; Figures 18-21; Goffman, 1971). Between Matt and the team are three people in the middle of the room in various stages of performing lifts: A class member, Bob, to the right; a team member, Gene, in the middle; and another class member, Chris, to the left. Over the course of several seconds, Matt, who has been gazing to his right at Bob, shifts his gaze left to the back corner of the room soon after Bob drops the weight (lines 1 and 2; Figures 18 and 19). Then he shifts his gaze to the right again, alighting briefly on team member Gene, who has started readying himself for a lift, but then continues moving his gaze back to Bob (lines 2 and 3; 20). Additionally, as Matt shifts his gaze, he also shifts his feet, thus not taking up a stable watching position. Across the room, the team members have shifted their gaze to watch Gene begin his lift: First Coach and then the other team members follow (line 3 and 4; Figures 21). Matt, however, whose gaze has once more alighted on Bob, again shifts his gaze left, alighting briefly on the team as they watch Gene, but continuing to the back corner of the room (line 4; Figure 21) where he cranes his neck as if trying to get a better look at something (line 5). Plainly put, Matt looks like someone who does not know where to look or what to do with himself. His leftward gaze shift at this point to the far back of the room does considerable work to distance himself from the collective event and the claims of membership that watching Gene as his team members watch him might presumptuously invoke.

Example 5. “Only for some” (Gene)

  Open in a separate window

Example 5 highlights that new members of a group need to be acculturated to the group’s routines, watching and otherwise, that separates “regulars” from “novices” along a continuum of group belonging. Thus, Matt, in these ambiguous circumstances of membership comports himself so as not to be seen looking overlong and claiming for himself a level of belonging that he has not yet achieved. This example highlights the “observability of observation,” (Sharrock & Turner, 1980) which is that when participants are locally collecting themselves as a socially organized display of watching, they are sensitive to whether the rights of membership in a particular group permit them to do so. As with several other cases involving newcomers to the gym classes in our data, Matt acts as if he is not entitled to observably watch or collect himself in the watching formation with the others to do so. Committing to a formation of watching has in situ implications. In cases such as this, the implications are that one has a sufficient relationship with Gene and the others watching him to be seen also watching him. Parties to watching in such “only for some” settings act as if membership (e.g., friend, coach, teammate) organizes their participation in the activity of watching.

The group membership enacted in watching “only-for-some” settings such as those in powerlifting teams and “for-anyone” settings on the street are clearly different. In settings of “for anyone”, parties rarely seem troubled—as displayed in their embodied comportment—by whether they can or should watch. However, in the “breaches of the local scene” cases we present next, the watchability of some public events can be treated as a sensitive issue by passersby. These breaches of the local scene are not treated as designed to be watched as we see from the conduct of those who only provisionally watch, or those who only glance but do not engage in full-fledged watching.

5.3 Example 6: “Anyone can watch, but should they?”

This case is drawn from a YouTube video of a street fight in Los Angeles between two young men identified as gang members in the video description. There is a prolific genre of street fight videos on YouTube. Some of these fights are staged in spite of their online depiction as resulting from an actual conflict that has turned physical. That is, the fighters’ and videographers’ main purpose is to show off the fighters’ skills and create entertainment for street audiences (not so unlike Karcocha the clown in Example 5) as well as for social media audiences.

In Example 6, it is not clear if the fight is real or staged. There are several stylized elements to the fight that show a level of cooperation between the fighters, including how they begin in a squaring off formation and the fact that several people have gathered around the fighters in a committed watching formation with their cell phones out to record the event (Figure 22). Video stills are presented rather than a “transcript” because our analysis is not based on the granular movements of individuals relative to one another, but rather on embodied displays that endure over several seconds to over a minute.

Figure 22. Two fighters square off. A crowd forms behind them. Some record the event.

The act by passersby of recording the event treats it as one that is particularly worthy of preserving for repeated subsequent watchings by oneself and others and—especially when posted on social media—as worthy of being watched by a mass audience. Further, many of the people gathered around to watch are smiling (rather than showing fear), further suggesting that the event is of a primarily performative nature.

Other people, however, treat the fight as an ambiguous social event. Across the street, some people appear uncertain about what is going on and what they should do; for example, they look, but continue walking. Others have taken up a committed watching position from a distance, while still others have taken up an ambiguous watching position with eyes directed toward the fight, but feet facing in the direction in which they were headed, in other words, conveying that they’ve only stopped for a moment to look. (Figure 23). In these ways, those present on the scene differentially orient to and constitute the street fight as, on the one hand, an exhibition-of-sorts designed to be watched, and on the other hand, an unseemly spectacle that should be avoided.

Figure 23. Man takes up ambiguous watching position. Couple across the street watch fight from afar.

We are particularly interested in the man in Figure 23 who is standing across the street from the fight holding a bag. Similar to Matt’s conduct in Example 5, the man in Figure 23 shows an ambiguous watching orientation, albeit in a reversed direction: He gazes at the fight to his left, but keeps his body directed forward (whereas Matt kept his body directed to the lifter in the middle of the room and moved his gaze to points around the room) (Figure 23a). That is, the man shows that he will return to the forward direction as his dominant order of involvement (from the video it appears that he first approaches his truck to get into it), and that what he is gazing at is a transitory, subordinate involvement (Kendon, 1990).

Figure 23a. Close up, man with bag.

Nearly a minute later, however, as the fight evolves and more people have gathered as an audience, the man with the bag changes course. Perhaps attracted by the growing crowd, he takes up a position behind a woman recording the event with her phone and leans around her to get a better look (Figure 24).

Figure 24. Man with bag peers around woman recording fight.

In Example 6, the man does not commit to watching the fight in the same way the others do. Although he eventually joins the watching formation with a fully aligned body posture, he does not move to a position that would enable clear visual access toward the event; rather, he remains in a leaning, unstable position behind the others; after about 6 seconds, he turns and walks in the other direction. In this way, by not moving forward, by not fully committing to a watching formation with the others, he observes the fight without completely ratifying it. His actions highlight a contrast between those committed to being an audience to the fight, and someone who is momentarily observing it.5 We argue that there is a graduated division between the audience committed to watching, and observers maintaining postural instability and/or some distance from the event, thereby avoiding being seen as committed watchers. Put another way, a committed watcher not only aligns with the rest of the audience, but with the watched activity, building a shared participation framework between the watchers and the watched in ratifying the event as “watch-worthy”.

6. Discussion

In this article, we have argued for a distinction between two sorts of gazing actions that are recognizable to participants as being undertaken by the gazer(s) for different purposes, and that, in conjunction with the target of the gaze and participants’ relationships with one another, constrain instances of shared visual activity. Table 1 highlights the different aspects of these gazing actions that we have examined:

Tabel 1

Visual Movement Sudden shift of visual attention Sustained visual
Body Posture Torqued Aligned
Epistemic Orientation “Newly discovered” “Already discovered”
Participation Framework Unrestricted (anyone can follow) Contingent on design of watched event and rights to watch associated with group membership

In line with Goffman’s observations that gazing at others involves constraints, we have demonstrated that gazing with others at happenings in the environment involves constraints that are complicated by a number of issues. First, certain sorts of gazes (noticings), motivate co-looking across a broad range of settings and types of participants. We suggest that the epistemic motivation of this gazing action as a visual inquiry into something that has newly drawn the attention of the gazer is the basis for its permeable boundaries of co-participation. That is, co-present persons are attracted by the curiosity of the gazer, realized in the embodied production of the gaze act, and they undertake their own visual inquiry. Concern about the happenings in one’s environment, and such mechanisms to monitor for changes as a sensitivity to others’ gaze shifts, may be primordial (at least for humans, other primates, and many mammals; Baron-Cohen, 1997), but what happens next is subject to a variety of social and relational factors: Contingent upon what has been discovered with the initial noticing and the subsequent gaze follow, should the gazers continue to gaze and/or what courses of action might they appropriately engage in next?

Second, our analysis of watching allows for a detailed consideration of the variety of factors that motivate and constrain co-looking. As we discussed, performers or others may design activities to be watchable “for anyone”, and passersby may act as if this is so and stop to form an audience. Likewise, certain activities may be treated as “only for some”, that is, that a certain specialized membership is treated as a prerequisite for joining in an audience. Lastly, we demonstrated that committing to being an audience, and thereby ratifying the event as watchable, may be treated as a sensitive matter by participants, in particular, that in watching certain sorts of events, individuals become a certain sort of person: The kind who watches fights in public, stops to look at car crashes and so on. Each of these distinctions in watching highlights the public and constitutive order of perception (Merleau-Ponty, 1962; Goodwin, 1994) that in watching an event, participants collaborate in bringing into being what that event is.

In sum, gaze serves not only as means to obtain visual access to something, but also as a differentiable embodied display to others that there is something worth looking at in their shared environment. Human visual conduct thus acts on the social world, displaying a member’s analysis of what sort of event is being looked at and who, as a member tied to others with particular sorts of membership rights and obligations, may join in.


Baron-Cohen, S. (1997). Mindblindness: An Essay on Autism and theory of Mind. MIT Press.

Bavelas, J. B., Coates, L., & Johnson, T. (2002). Listener responses as a collaborative process: The role of gaze. Journal of Communication, 52(3), 566-580.

Butterworth, G., & Cochran, E. (1980). Towards a mechanism of joint visual attention in human infancy. International Journal of Behavioral Development, 3(3), 253-272.

Carlin, A. (2014). Working the crowds: Street performances in public spaces. In Brabazon, T (Ed.) City Imaging: Regeneration, Renewal and Decay (pp. 157-169). Springer Netherlands.

Carpenter, M., Nagell, K., Tomasello, M., Butterworth, G., & Moore, C. (1998). Social cognition, joint attention, and communicative competence from 9 to 15 months of age. Monographs of the society for research in child development, i-174.

Corkum, V., & Moore, C. (1998). The origins of joint visual attention in infants. Developmental Psychology, 34(1),28.

Driver, J., Davis, G., Ricciardelli, P., Kidd, P., Maxwell, E., & Baron-Cohen, S. (1999). Gaze perception triggers visiospatial orienting by adults in a reflexive manner. Visual Cognition, 6, 509–540.

Friesen, C., & Kingstone, A. (1998). The eyes have it! Reflexive orienting is triggered by nonpredictive gaze. Psychonomic Bulletin & Review, 5, 490–495.

Goffman, E. (1963). Behavior in Public Places. New York: The Free Press.

Goffman, E. (1977). Relations in public: Microstudies of the Public Order. New York: Basic Books.

Goodwin, C. (1979). The interactive construction of a sentence in natural conversation. In G. Psathas (Ed.) Everyday Language: Studies in Ethnomethodology. New York: Irvington, 97-121.

Goodwin, C. (1980). Restarts, pauses, and the achievement of a state of mutual gaze at turn‐beginning. Sociological Inquiry, 50(3‐4), 272-302.

Goodwin, C. (1981). Conversational organization: Interaction between Speakers and Hearers. New York: Academic Press.

Goodwin, C. (1994). Professional Vision. American Anthropologist, 96(3), 606-633

Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal of Pragmatics, 32(10), 1489-1522.

Goodwin, C. (2003) The Semiotic Body in its Environment, in J. Coupland and R. Gwyn (Ed.) Discourses of the Body, pp. 19-42. New York: Palgrave/Macmillan.

Gullberg, M., & Holmqvist, K. (1999). Keeping an eye on gestures: Visual perception of gestures in face-to-face communication. Pragmatics & Cognition, 7(1), 35-63.

Jefferson, G. (2004), Glossary of Transcript Symbols with an Introduction, in G. H. Lerner (Ed.) Conversation Analysis. Studies from the First Generation, pp. 13-31. Amsterdam: John Benjamins.

Kendon, A. (1967). Some functions of gaze direction in social interaction. Acta Psychologica, 26, 22–63.

Kendon, A. (1990). Conducting interaction: Patterns of behavior in focused encounters (Vol. 7). Cambridge University Press.

Kendon, A. (2010). Spacing and orientation in co-present interaction. In (Eds.) Hussain, A, Vogel, C, Nijholt, A, Esposito, A, Campbell, N. Development of Multimodal Interfaces: Active Listening and Synchrony (pp. 1-15). Springer: Berlin Heidelberg.

Kidwell, M. (2009). Gaze shift as an interactional resource for very young children. Discourse Processes, 46(2-3), 145-160.

Kidwell, M., & Zimmerman, D. H. (2007). Joint attention as action. Journal of Pragmatics, 39(3), 592-611.

Langston, S., & Bruce, V. (1999). Reflexive social orienting. Visual Cognition, 6, 541–567.

Merleau-Ponty, M. (1962). Phenomenology of Perception. Psychology Press.

Mondada, L. (2019). Conventions for Multimodal Transcription, version 5.0.1.

Pomerantz, A. (1984). Agreeing and Disagreeing with Assessments: Some Features of Preferred/Dispreferred Turn Shape. J. M. Atkinson and J. Heritage (Eds.) Structures of Social Action: Studies in Conversation Analysis, pp. 57–101. New York: Cambridge University Press.

Psathas, G, (1990). The organisation of talk, gaze and activity in a medical interview. In G. Psathas (Ed.). Interaction Competence. Washington: University Press of America.

Ricciardeli, P., Bricolo, E., Salvatore, M., & Chelazzi, L. (2002). My eyes want to look where your eyes are looking: Exploring the tendency to imitate another individual’s gaze. NeuroReport, 13, 2259–2264.

Robinson, J. D. (1998). Getting down to business: Talk, gaze, and body orientation during openings of doctor-patient consultations. Human communication research, 25(1), 97-123.

Rossano, F. (2013). Gaze in Conversation. In J. Sidnell & T. Stivers (Eds). The Handbook of Conversation Analysis (308-329). Thousand Oaks, CA: Blackwell.

Scaife, M., & Bruner, J. S. (1975). The capacity for joint visual attention in the infant. Nature, 253(5489), 265-266.

Schegloff, E. A. (1984). On some questions and ambiguities in conversation. JM Atkinson ve J. Heritage (Eds.), Structures of social action: Studies in conversation analysis (pp. 28-52).

Schegloff, E. A. (1998). Body torque. Social Research, 65(3), 535-596.

Sharrock, W. W., & Turner, R. (1980). Observation, esoteric knowledge, and automobiles. Human Studies, 3(1), 19-31.

Smith, T. E. (2017). Solving the payment problem: an interactional analysis of street performance. Unpublished Doctoral Thesis. University of Edinburgh.

Stivers, T., & Rossano, F. (2010). Mobilizing response. Research on Language and Social Interaction, 43(1), 3-31.

Streeck, J. (1993). Gesture as communication I: Its coordination with gaze and speech. Communication Monographs, 60, 275–299.

Tomasello, M. (1999). The Cultural Origins of Human Cognition. Cambridge, London: Harvard University Press

Appendix - Transcription

[ overlapping talk
= continuous talk
(0.2) silence in tenths of a second
(.) micropause
. final intonation
? rising intonation
, continuing intonation
! animated tone
: prolongation of the preceding sound
we- cut-off of the preceding sound
you emphasis
THAT loud talk
owordo very quiet talk
rise in pitch
fall in pitch
> < talk is compressed
< > talk is slowed
hh aspiration
.hh inhalation
((laughs)) transcriber's comments
( ) problematic, uncertain, or alternative hearings


2 The timing includes the timing in parentheses plus the timing of the utterance in line 4, which .5 seconds.

3 Following a noticing gaze shift happens in variety of configurations. Participants may be side to side as in Examples 1 and 4, or in circular formations as in examples 2 and 3.

4 We recognize that this is a very rough distinction to draw between “for-anyone” and “only-for-some” situations. There are of course very subtle matters of membership being enacted in situations of “for anyone”’ but the nature of our collection and the focus of analysis forces us to simply recognize that for participants there is some sort of delicateness involved in watching some situations (Example 6 and 7) while not in others (Example 5). The nature and nuances of that delicacy is a matter for further investigation. See Reynolds (2017) for more on issues of membership in watching.

5 In this video there are several background participants who glance at the fight but move on, taking a pace at which moving away from the scene is a priority (a fast walk).