Social Interaction

Video-Based Studies of Human Sociality

Doing Virtual Companionship with Alexa

Lauren Hall, Saul Albert & Elizabeth Peel

Loughborough University

Abstract

Technologists often claim that virtual assistants, e.g., smart speakers, can offer 'smart companionship for independent older people'. However, the concept of companionship manifested by such technologies is rarely explained further. Studies of virtual assistants as assistive technologies have tended to conceptualise companionship as a 'special form of friendship' or as a way of strengthening 'psychological wellbeing' and 'emotional resilience'. While these abstractions can be measured using psychological indices or self-report, they are not necessarily informative about how 'virtual companionship' may be performed in everyday interaction. This case study focuses on how a virtual assistant is used by a person living with dementia and asks to what extent it takes on a role recognizable, from interactional studies, as 'doing companionship'. We draw on naturalistic video data featuring a person living with dementia in her own home using a smart speaker. Our results show how actions such as complaints about and blamings directed towards the device are achieved through shifts of 'footing' between turns that are ostensibly 'talk to oneself' and turns designed to occasion a response. Our findings have implications for the design, feasibility, and ethics of virtual assistants as companions, and for our understanding of the embedded ontological assumptions, interactive participation frameworks, and conversational roles involved in doing companionship with machines.

Keywords: Companions, Virtual Assistants, Conversation Analysis, Discursive Psychology, Dementia

1. Introduction

This paper explores practices of companionship in the everyday use of a virtual assistant (VA) by an older adult living with dementia. The aim is to understand how older adults living with dementia can work with VAs and 'smart home' technologies, and to explore how, as promotional materials often claim, these devices take on the interactional role of a companion (Ring et al., 2013). Companionship is often cited as a 'feature' of VA devices in human-computer interaction (HCI) research in ways that combine the role of companions in medical and social care visits with a more general sense of companionship as providing emotional engagement and entertainment (e.g., Cooper et al., 2020; McTear et al., 2016). However, these conceptualisations of companionship are usually conflated without specifying what kinds of actions and practices might constitute 'doing companionship' on the one hand, and 'doing being companioned', on the other. While conversation analytic (CA) studies of human interaction have begun to catalogue a range of 'companionship practices', usually involving triadic interactions e.g., where a companion accompanies someone to an official medical appointment (Pino & Land, 2022; Pino et al., 2021), less is known about how a more generic form of 'virtual companionship' might work in everyday settings. Here we draw on interactional studies of companionship in institutional and medical settings, but we use these insights to discuss companionship in the more generic, lay meaning of co-presence with a mutually acquainted other, where the companion is available for interaction and emotional engagement. In this way, we aim to outline what 'virtual companionship' might involve as an interactional achievement, and how it might be better understood and evaluated as a technical goal for designers.

Our data consist of naturalistic video recordings of the daily routines of an older adult living with dementia in her own home, where a smart speaker is already well integrated alongside smart home devices such as voice responsive lights and electrical plugs. We use conversation analysis (CA) to explore this 'smart homecare' participation framework as a perspicuous setting (Garfinkel & Wieder, 1992) for respecifying 'virtual companionship' in terms of interactional practices (Clayman, 1995). Since companionship as a phenomenon is not directly observable, we borrow a methodological approach from discursive psychology (DP) by examining moments in which "the topic of interest [i.e., companionship] is demonstrably accomplished, while not being explicitly referred to" (Huma et al., 2020, p. 323). Here we focus on interactions with a VA through the lens of Goodwin's (2007) interactive revision of Goffman's (1981, p. 128) concept of 'footing' i.e., the ''changes in alignment we take up to ourselves and others" in everyday talk. Observing how shifts of footing in this framework can implicitly index users' interpretation and use of the device as a co-participant, if not, explicitly, as a 'companion'. Through this methodological lens, we can ask, for example, what an utterance in the 'perceptual range' (Heath & vom Lehn, 2013)-or perhaps 'sensor range'-of a VA device may be accomplishing. We can examine the design of turns: whether or not they have an explicitly nominated recipient, and their sequentiality e.g., whether they occasion a clearly relevant response. This approach enables us to answer the question: how, and to what extent, is a VA treated as a companion in everyday interaction?

1.1 The concept of companionship in communication research

Health communication studies of the participation of companions in doctor-patient interaction have tended to characterize a set of companion 'roles' and related health outcomes rather than focusing on interactional practices. Pino & Land (2022) highlight this tendency to identify the 'observer role' as either detrimental, unsupportive, or hindering, or helpful and supportive to patient actions. They argue that these designations cannot capture the situated and context-dependent nature of companionship in practice. Studies that rely on interviews or other post-hoc self-report measures to examine attitudes towards companionship also tend to ignore participants' in-situ companioned practices. For example, Graham and Tuffin (2004) interviewed older adults living in a retirement home and developed a discursive concept of companionship as a broadly positive, friendly, 'congenial environment', set in opposition to negative associations with loneliness and isolation, and in tension with the need for privacy. While it is important to understand the discursive construction of companionship in health research in this way, these insights offer little practical insight into how companionship is performed and interpreted interactionally. CA research, which has also tended to focus on healthcare encounters, has been able to use the institutional setting and ostensibly self-evident roles of e.g., 'patient' and 'healthcare worker' to circumscribe companionship more straightforwardly as "people who know the patient and attend healthcare encounters with them" (Pino et al., 2021, p. 2010). However, starting from this straightforward framing of a perspicuous setting for examining the phenomenon of companionship, CA studies have then explored how companionship is constructed in and through interaction, exploring participant actions as embedded in their sequential and situated contexts (Antaki & Chinn, 2019; Bosisio & Jones, 2023; Doehring, 2018; MacMartin et al., 2014; Pino & Land 2022; Pino et al., 2021; Robson et al., 2016).

The present study focuses on interaction in a domestic setting, where a participant is interacting with a virtual assistant. While we draw on CA studies of companionship-related practices in (mostly) institutional and clinical settings, we recognize that our study setting is neither typical of everyday interaction in a domestic environment, nor equivalent to the 'interactional fingerprint' of most institutional and clinical encounters (Heritage & Drew, 1992, p. 26). Our study, however, draws on CA studies of companionship to interrogate this concept in relation to technologists' claims to be enabling forms of 'virtual companionship'. A further complication here is that we are transposing some interactional practices from human-human to human-computer interaction (HCI), and while our literature review identified CA studies of technology-mediated human companionship (e.g., Stommel & Stommel, 2021), none, as far as we know, focus specifically on concepts of companionship with machines. In the following section we therefore review the wide range of conceptualisations of companionship within the field of human-computer interaction (HCI) including some that focus specifically on dementia. We then discuss how CA studies of VAs in general (though not necessarily as companions in particular), inform our methodological choices for the present study.

1.2 Conceptualizations of companionship in Human-Computer Interaction (HCI)

Many HCI studies in the field of 'affective computing' aim to design and evaluate virtual agents that can emulate a companion relationship with older user groups (Ring et al., 2013). Ramadan et al. (2021) suggest that consumer technologies such as the Amazon Echo smart speaker were designed for this purpose with a 'friendly personality', making it easier for users to relate to the device (Ramadan et al., 2021; Roettgers, 2019). User studies (e.g., Lopatovska & Williams, 2018; Woods, 2018) have highlighted 'human-like' characteristics and design features of Amazon's VA Alexa, such as the ability to interact using spoken natural language and the use of gendered names and voices-both female, by default, in the UK (Blair & Abdullah, 2019). Jung et al. (2023) notes the special relevance of gender in the design of interactional VAs, suggesting that users perceive female agents as friendly and warm in comparison to male and non-binary agents (Jung et al., 2023; Schwär & Moynihan, 2018; Shead, 2017). Sutton (2020) argues that attributing gender to a VA influences how the device is used and interpreted, particularly in cultivating the impression of companionship. In assistive and healthcare technology research in general, as well as in the publicity material of tech companies, VAs are increasingly presented not only as helpful assistants for the 'smart home' but also to provide users with "an experience, a friend, and a companion" (Amazon Echo, 2019; Ramadan et al., 2021, p. 603; Weinstein, 2019) and, furthermore, as healthcare devices that can provide a "substitute caregiver or companion" that can handle moderate levels of everyday support (Wright, 2021, p.817). For this techno-centric conceptualization of companionship, the ability to 'manage loneliness', and even 'save lives' (Ring et al., 2013) depends on the device exhibiting anthropomorphic qualities.

1.2.1 Do users anthropomorphize virtual assistants?

Anthropomorphism i.e., ascribing "human like properties, characteristics or mental states to nonhuman agents and objects" (Epley et al., 2007, p.865) is often understood as an internal, psychological process in HCI. The notion that users anthropomorphize computers often underpins approaches that claim (and evaluate) the degree to which humans and computers can form emotional sympathy and intimacy (Chen et al., 2020; Hill et al., 2015; Ring et al., 2013). For example, Epley et al. (2007) propose the 'SEEK' framework of three psychological scales of anthropomorphism including 'Sociality' (the degree to which an agent interacts with humans), 'Effectance' (interacting with objects and the environment), and 'Elicited agent Knowledge' (evidence of learning about self, world, and others). More broadly, this mediagenic framing of 'human-like machines' also offers compelling explanations for how e.g., Alexa is being referred to in as a 'friend' or as an 'invisible woman' (Turk, 2016). Other approaches within HCI, more critical of this kind of anthropomorphism, adopt the 'Computers Are Social Actors' (CASA) paradigm (Lee & Nass, 2010) which makes more the limited claim that users apply social rules when interacting with computers. Nass and Moon (2000), for example, argue that when users assign 'personality' to a computer by e.g., using politeness markers such as 'please' and 'thank you', or in ascribing gender to a device, they are not truly anthropomorphizing them as such (i.e., seeing them as equivalent to humans). Instead, they argue that users 'mindlessly' exhibit a "direct response to an entity as human while knowing…[it] does not warrant human treatment or attribution" (p. 94). From this perspective, it hardly matters whether or not users explicitly self-report via interviews, focus groups, or other introspective methods (Potter, 2012) that they conceptualise their devices as occupying contrasting categories i.e., living/human or non-living/non-human (see e.g., Pradhan et al., 2019; Tong et al., 2022). Instead, we want to understand how users' social behaviours and interactions with VAs can ascribe participant roles, member categories, and associated interactional expectations to these devices (Fischer, 2021; Pelikan et al., 2022).

1.2.2 Virtual companionship and dementia

Such questions are especially important when evaluating the efficacy and ethics of developing VAs for older adults living with dementia (Amrhein et al., 2016), where these technologies often promise to prolong independent living and reduce the need for human care-assistance (Baldwin, 2005; Begum et al., 2013; Meiland et al., 2017; Pradhan et al., 2020). A review of this literature by Mordoch et al. (2013) warns that while devices that they term 'social commitment robots' may enhance social engagement, provide companionship, and promote acceptance of these technologies amongst vulnerable populations, technologists should be careful not to infantilize or deceive users into thinking they are interacting with a human (Riddoch, 2021). Presenting technological devices as friendly, approachable companions risks overcoming users' prominent (and, often, well-founded) ethical concerns about data privacy (Ienca et al., 2017; Zhu et al., 2022). Many 'smart homecare' technologies increasingly adopt a surveillance paradigm that is more focused on sharing users' data with family members, care providers, and 'surveillance platforms' (Percy-Campbell, 2023), than on enhancing the user's autonomy. Such surveillance-based systems fit neatly with a utilitarian rationale in health and social care provision whereby technology replaces and reduces the costs of human social care (see e.g., Meiland et al., 2017; O'Brien et al., 2020; Pradhan et al., 2018; Turner-Lee, 2019). Older adults, especially those living with dementia, still need human contact, and these devices still have very limited diagnostic and care capabilities (Broekens et al., 2009; Evans et al., 2015; Mordoch et al., 2013). Policymakers increasingly see such devices as an 'efficient' way to supplement understaffed and underfunded care provision services (Wright, 2021). These high stakes mean that, despite indications (from interviews, surveys, and focus group-based studies) that people living with dementia may accept virtual companionship technologies (e.g., Demiris et al., 2004; Demiris et al., 2016; Lazar et al., 2016), important questions remain about ethics, efficacy, and about precisely how these devices seek to achieve companionship and alleviate loneliness in practice.

1.2.3 Observational studies of virtual assistants as companions

Early feasibility studies of virtual assistants for elderly and disabled people triangulated between methods such as focus groups, participatory design, and 'Wizard of Oz' experiments where a human confederate poses as an autonomous system (Kramer et al., 2013; Yaghoubzadeh et al., 2013). While such studies can reveal interactional details, hypothetical and speculative study designs are not always informative about how people will interact with technologies once they are integrated into everyday settings (Höhn, 2019; Porcheron et al., 2021; Stokoe et al., 2020). Similarly, longitudinal studies of virtual companions for people living with dementia have tended to use more complex behavioural coding (of e.g., time of use and patterns of activation), and some have shown positive mental and physical health outcomes (e.g., Park & Kim, 2022; Šabanović et al., 2013), but without describing, in detail, the interactional practices that may contribute to these results. There is now a growing literature of interactional research examining how functional VAs participate in the sequential and embodied organization of talk in everyday social settings (e.g., Fischer et al., 2019; Porcheron et al., 2018; Reeves et al., 2019), and how users adapt to accommodate the constrained affordances and dynamics of interaction with voice interfaces (Pelikan & Broth 2016; Pelikan & Hofstetter, 2023; Wu et al., 2022). Some have also engaged with the HCI discourse on virtual companionship by asking, for example, how users ascribe agency or attribute social competencies to these devices in interaction (Albert & Hamann, 2021; Alač et al., 2020; Pelikan et al., 2022; Tuncer et al., 2023), and how emotional and empathic responses are occasioned in these interactions (Pelikan et al., 2020). Others have explored the role of human companions in triadic interactions involving VAs (Albert et al., 2023; Krummheuer et al., 2020). While, as far as we know, none of these studies have asked how, if at all, a VA itself may act-and be treated-as a companion, this emerging CA literature shows how interactions with VAs are (with a few key methodological caveats, outlined below) clearly tractable for conversation analytic methods.

2. Methodology

2.1 Data

Here we report on findings drawn from data recorded in the home of one of the participants in our wider study of how people living with dementia use virtual agents in their everyday lives. Through this case study, we focus on the one participant in our project (Annabelle, in our transcripts) who lives alone, in an apartment situated in a complex that has specialised dementia care facilities, with only her VA (Alexa) for constant company. This setting provides an especially perspicuous naturalistic setting for exploring the concept of 'virtual companionship' through her interactions with her VA Alexa (Höhn, 2019). Annabelle has been living with dementia for approximately 12 years however, shedoes not receive any daily care visits and only uses professional cleaners when necessary. She had capacity to give full informed consent for this study on her own behalf. She has one Amazon Echo device situated in her living room and two smart home light bulbs that she controlled with her voice. We set up a camera in the living room, where the Amazon Echo device is situated, with full view of the room and the device. The cameras were set to record when movement was detected in the room between the times of 6am-11pm each day, yielding a series of video recordings of approximately 5-10 minutes in length. We selected and transcribed around 30 minutes of video clips (in total) after several weeks of continuous recording. Annabelle was able to turn the camera on and off for privacy reasons whenever she wanted to. She was informed that the cameras would capture video recordings of her day-to-day activities and routines within the range of the cameras as well as when the Alexa was used, and that both these types of video data would be used for the research. Annabelle was given the opportunity to review the data and to delete anything she did not want us to see or to use. She was also aware that these data would be pseudonymized, but that she may still be visibly recognisable, and she gave informed consent for the data to be used without further anonymization for research and publication. Ethical approval was first granted by Loughborough University Ethics Approval Human Participants Sub-Committee in 2019. While these recordings were ostensibly similar to the kinds of data often used in CA studies (i.e., video featuring various forms of embodied interaction), our focus on companionship and the situation of a person 'alone' with a VA led to some specific requirements and constraints on how we transcribed and analysed these data.

2.2 Methodological approach

We used a mixture of standard CA transcription conventions (Jefferson, 2004) and Mondada's (2018) multimodal transcription methods to annotate simultaneous and embodied action. Since much of Alexa's 'talk' was standardized, and since it was not topicalized by Annabelle as interactionally relevant in any of the extracts, we did not usually annotate the prosodic or intonational emphasis of Alexa's talk. Where visible in the video, and where Annabelle was turned towards the Amazon Echo, we did annotate the activation of Alexa's 'wake light'. This light provides the user with visual feedback as to whether the device has 'heard' the 'wake word'-in this case-Alexa. The wake light can provide an interactional resource for e.g., recognizing that there has been a problem of 'hearing', and for tracking other indicators of the device's availability for interaction. Here we adopt the conventions developed by Albert and Hamann (2021) to show the activation pattern of the wake light with a blue bar running vertically down the page next to the line numbers, indicating when Alexa is visibly 'listening', or not.

Our analysis consisted of an applied CA case study (Antaki, 2011) of Annabelle's interactions with her VA in a domestic setting. CA's ethnomethodological, constructivist stance (Bauman, 1973) allowed us to focus on the social organisation of talk and how the participant makes sense of activities in their own home (Wooffitt, 2005). Naturalistic video of everyday domestic activities in which a VA is well-integrated (rather than, for example, in a laboratory, prototyping, or Wizard of Oz study) allowed us to explore and discuss specific moments and mundane methods in the talk, visible behaviours, and spatial arrangements of the user's interactions with the VA (Heath et al., 2010). Where both parties engage in turn-by-turn talk, we can observe the grammatical, pragmatic, and sequential/temporal relatedness of one turn to the next using CA's 'next turn proof procedure' (Heritage, 1984, p.256-257). This shows us how Alexa may or may not be treated, by the human participant, as an interlocutor or even, as we argue, as a companion. However, in some cases, the human participant is alone with the device and, by e.g., not using the 'wake word', we infer that her talk is designedly not directed towards Alexa (cf. Reeves et al., 2019). In such moments, i.e., where the participant is 'talking to herself', and is ostensibly 'alone' with her virtual companion, we encounter some methodological constraints about how we can use CA to understand what she may be accomplishing in and through interaction.

2.2.1 The state of talk with virtual companions

The typical state of talk when people are co-present with a VA in a domestic setting is not ongoing, turn-by-turn interaction. Schegloff and Sacks (1973) describe this kind of 'continuing states of incipient talk' as a loose conceptualisation of co-present interactional situations that fall outside a state of assumed conditional relevance between one turn and the next, where e.g., talk does not require distinct openings or closings (Berger et al., 2016). Within this state of talk, co-presence-or the opportunity to talk-is incidental, rather than situationally warranted by an ongoing joint activity as it would be in, for example, a phone call, and some structural features of continuous talk may be relaxed (Berger et al., 2016; Mondada, 2008). When describing interaction in this state of talk, CA concepts such as 'adjacency pairs' and 'attributable silences' (Jefferson, 1988) do not necessarily apply in the same way as in focused, turn-by-turn interaction, leaving open many methodological and empirical questions about entry into and exit from 'focused' talk (Goffman, 1967; Hoey, 2017). There may even be a kind of relaxed 'laconicity' (Goffman, 1981) about how turn-taking is done that forms part of how 'companionship' is achieved in this setting. Such issues place some methodological constraints on our analysis of conversation where a user is ostensibly alone, in a state of continuing incipient talk with a VA. We can still identify patterns of behaviour and regularities when, for example, a user pursues a response (Pomerantz, 1984), or responds in ways that otherwise indicates their interpretations and normative understandings of the technology (Wooffitt, 2005). However, these analytic constraints do leave us with some ambiguities. For example, the successful use of the 'wake word' to activate the device may, in some cases, function as a form of 'opening', but not in all cases. When a wake word is not used to activate the device, we might consider that talk as designedly not directed towards the device (Reeves et al., 2019) i.e., that the participant may be doing 'talk to oneself'.

2.2.2 Self-talk with a VA

Despite not having access to all of CA's analytical apparatus, we borrow terms from Goffman's (1978) descriptions of 'response cries', and from subsequent conversation analytic elaborations of these utterances' inherent indexical ambiguities (Aarsand & Aronsson, 2009; Hofstetter, 2020; Pehkonen, 2020). For example, whereas 'cries for help' produced in the presence of others may suggest that a response of some sort may be expected, Annabelle sometimes produces cries for help when alone (with Alexa), but without using the wake word. In such cases, we might assume that no response is expected. We therefore refer to this as 'self-talk': when no other parties are present, where there is no evidence that a response is expected, and no other party is ostensibly being addressed. Goffman's (1978) more analytically undifferentiated category of 'response cries' includes utterances made when members are solitary, or are assumed to be solitary, but where they continue to make passing comments aloud: informal remarks, judgements, or 'verbal commentaries of our own doings' (p. 787). Such utterances may address the speaker themselves, or may grammatically orient to an absent other in what Goffman (1978) describe as a 'mock-up' of conversation. To explore how these kinds of 'mock ups' might inform our understanding of people's uses and interpretations of a VA, here we adopt another Goffmanian concept: 'footing'.

2.2.3 Footing shifts with virtual companions

Goffman's (1981) discussion of 'footing' proposed a range of interactional roles through which talk is produced: the animator - "an individual active in the role of utterance production"; the author - "someone who has selected the sentiments that are being expressed and the words in which they are encoded"; and the principle - "someone whose position is established by the words that are spoken" (Goffman, 1981, p. 144). We draw on Goodwin's (2007) extension of this concept as 'Interactive Footing', which also identifies the contribution of hearers, and highlights turn-by-turn shifts between both speaker and hearer. This analytic framing allows us to identify how utterances in the 'sensor range' of a VA might shift, indexically, between talking with, talking to, or talking about the device, even when no response is ostensibly due. Despite these analytic constraints of studying talk while a participant is 'alone' with a VA, we can still explore the relationship between a speaker's actions and shifts in the turn design and 'production formats' of their footing between and within turns at talk (Okada, 2010). Throughout the analysis below, we note these shifts in Anabelle's interactive footing (Goodwin, 2007), as she moves between interactional roles as speaker/initiator of an interaction, where Alexa may be a wake-word-selected recipient, or a non-selected (non- 'hearing') recipient, or where Annabelle's utterances may be simply characterised as 'self-talk'.

3. Analysis

3.1 Terminology

In our analysis below, we coin the acronym 'SR3' to refer to a recurrent sequence of actions we have identified in users' interactions with VAs. This routine pattern consists of a user's 'wake word' summons (S), the VA's (visual/verbal) response (R), the user's subsequent request (R), and, finally, the VA's (visual/verbal) response to the request (R). Henceforth, we will refer to this as an 'SR3' sequence (SRRR). This is the predominant normative structure of sequences that occur in the interactions between Annabelle and her smart speaker where the Amazon Echo device is activated using the wake word 'Alexa'. The visual response in this case is the 'wake light' on the Amazon Echo device, which the summons (using the wake word), activates meaning the device is 'listening'. The user then formulates a request which the device responds to verbally.

3.2 Typical use of Alexa for assistance

In this section, we show how talking to Alexa can be straightforwardly useful for Annabelle in getting things done. In the following two extracts, Annabelle uses SR3 sequences to 'wake' Alexa, then issues a request that Alexa can (and does) fulfil.

Extract 1.

Open in a separate window

Extract 2.

Open in a separate window

In Extract 1 and Extract 2, we see Annabelle use an SR3 sequence with Alexa to control the lights. In both extracts she initiates the sequence with the wake word "Alexa" which activates the wake light on the device. Annabelle treats this visible response as relevant by waiting (for 0.7 seconds in Extract 1 and 0.9 seconds in Extract 2) before continuing to her request: "turn off the lights please" (Extract 1 - lines 01-02), "please turn on the lights" (Extract 2 - lines 03-04). In both extracts Alexa responds with "okay" (Extract 1 - line 04; Extract 2 - line 06), responding to the request before the light turns on/off. These SR3 sequences demonstrate Annabelle's understanding of how the 'wake word' function operates, and her knowledge of the capabilities of Alexa to control the lights.

In Extract 3 Annabelle is sitting on the sofa with a full view of Alexa. We see an expansion of an SR3 sequence to set a reminder.

Extract 3.

Open in a separate window

The extract starts with Annabelle sitting on the sofa with a full view of Alexa. On lines 01-02, Annabelle works through the 'typical' format of an SR3 sequence by summoning Alexa, pausing and continuing to a request (line 01-02). Alexa responds with a request for clarification, "what's the reminder for" (line 03), initiating an insert expansion of the SR3 sequence. This highlights some of Alexa's more advanced interactional capabilities when more detail is needed to successfully complete a task. Annabelle duly demonstrates her understanding and ability to use this interactional method by responding to Alexa's clarification request with "to go to the talk" (line 05). Alexa then acknowledges Annabelle's response with "okay", then in line 07, explicitly reformulates the task as described in Annabelle's initial request from "remind me at ten minutes to four" to "I'll remind you at three fifty pm", explicitly and securely demonstrating understanding of the time in question. Annabelle thanks Alexa (line 09) but this time, she shifts her footing by designing her "thank you" quietly, without using the wake word, i.e., in a way that is not meant to be 'heard' or responded to by Alexa. Her shift from SR3 sequences to something more typical of human-human interaction may be an example of 'spill-over' (Albert & Hamann, 2021) from everyday talk.

Similarly, in Extract 4, Annabelle designs a summons and request using a modal verb, asking "can you" when requesting that Alexa plays music (lines 05-06).

Extract 4.

Open in a separate window

This use of the modal form bears resemblance to what Curl & Drew (2008) refer to as a form of 'contingency' in the design of requests, as opposed to forms of 'entitlement'. 'High-entitlement' requests, for example, are designed to treat the granting of the request as self-evident and unproblematic for the request recipient to grant. Conversely, turns designed in ways that highlight difficulties or contingencies for the recipient in granting the request display 'high contingency'. By using this terminology, we are not suggesting that the user of a VA is somehow concerned with how a device may feel about or respond to such displays. We are simply pointing out that Annabelle uses different turn designs in her requests that display varied degrees of entitlement and contingency - thus treating these differences as relevant. Differences between the design of Annabelle's requests to Alexa within the SR3 sequences suggest that she uses 'high entitlement' formats e.g., "turn off the light" where she understands the task to be straightforwardly achievable and within Alexa's repertoire. Her use of modal verbs that deal with ability accounts in other commands e.g., "can you play…" suggests lower confidence/higher contingency regarding Alexa's ability to fulfil the request. Again, these shifts of footing may indicate some 'spill-overs' from everyday interaction.

3.3 Self-talk, distress displays, and invoking absent others

Throughout this section we will show Annabelle using self-talk in different ways to display helplessness. In the following four examples, Annabelle topicalises her distress/confusion, verbally tracks what she is currently doing, and invokes others in her distress displays through reported talk.

In Extract 5, we see Annabelle walking away from the Echo device, leaving the room while doing a common form of 'self-talk'.

Extract 5.

Open in a separate window

In our data set, Annabelle often uses utterances such as "no idea" (line 12) and "not a clue" (lines 14 and 16) without directing the talk towards an interlocutor. Even without context or evidence of recipiency for these utterances, we can recognize them as explicit displays of confusion, and sometimes distress. We see similar uses of self-talk and explicit displays of helplessness in Extracts 6 and 7.

Extract 6.

Open in a separate window

In Extract 6, Annabelle walks into the living room, past the Echo device and sits down on the sofa while repeating "oh dear dear dear". In line 04 she adds "I give up", suggesting a strong negative valence to the ostensibly self-directed 'cry for help', to borrow Goffman's (1978) term. In Extract 7, Annabelle has been doing a similar form of negative self-talk while looking for her glasses. The extract starts with Annabelle walking from her bedroom back into the living room, continuing to say, "where are they" (line 11).

Extract 7.

Open in a separate window

In all these examples of ostensible 'self-talk', we see no pursuit of a response, nor use of a wake-word to prompt Alexa. Line 13, where Annabelle announces, "oh there they are" suggests that this form of self-talk functions as a form of online commentary (Heritage, 2017), anchoring her intentions and progress in an ongoing verbal account of her 'visual search' (Coulter & Parsons, 1990).

In Extract 8, we see how absent others can be invoked during this kind of activity. Annabelle is in the living room next to the sofa. Again, we see her using self-talk as a form of 'online commentary' of her ongoing activities including displays of distress.

Extract 8.

Open in a separate window

The use of "oh dear", similar to Extract 6 functions as a preface for a 'cry for help', where Annabelle explicitly articulates her confusion through a self-directed question and answer pair: "what am I doing, I don't know" (line 11). In the next lines (lines 13-15), Annabelle shifts footing using self-talk to report others' speech/intensions, referring to what an absent "he" would tell her to do. On line 17, Annabelle shifts footing again, back to self-talk, "Ah deary me, wha mi gonna do". Annabelle uses these footing shifts from self-talk about her own perceptions, to mobilising absent others' and what 'he' might tell her to do as a method for managing confusion and distress.

In the following section, we show how, when Annabelle speaks to, about, or around Alexa without seeming to expect a response, she is also engaging in a form of 'self-talk'. Her turns, in this environment, are not designed with a 'wake word', nor otherwise designed to occasion a response from Alexa, nor does she pursue a response or make an issue of Alexa's failure to respond.

3.4 Blame, complaints, and insults directed at Alexa

In this section we explore moments where Annabelle shifts her footing to complain about or to Alexa. We focus, first, on a clear-cut instance of 'self-talk' with Alexa, where no 'wake word' is used to activate the device. We then focus on episodes where SR3 sequences fail, where Alexa produces an inapposite response, or when there is some other kind of problem with fulfilling the request component of the SR3 sequence. In all cases, Annabelle then treats Alexa as 'blameable' or 'complainable-about'.

3.4.1 Complaints and insults 'behind Alexa's back'

Here we will explore episodes in which Annabelle complains about Alexa without using the wake word, i.e., 'behind Alexa's back'.

Extract 9.

Open in a separate window

In Extract 9, Alexa is sounding an alarm (line 04) that Annabelle complains about in lines 06-07. In this complaint, she topicalizes not understanding why Alexa is sounding the alarm, then shakes her head while voicing her disapproval "I don't appreciate it". This treats Alexa as a 'blameable other' without activating the device or projecting a response.

In Extract 10 we see a similar blaming of Alexa when an SR3 sequence goes awry. Annabelle is sitting on the sofa with full view of the device.

Extract 10.

Open in a separate window

Extract 10.Continued (Lines 17-35 omitted)

Open in a separate window

In Extract 10 Annabelle first complains about, then 'upgrades' to insulting Alexa during a series of SR3 sequences where Alexa fails play the requested song. Here we see a series of quick shifts of footing that demonstrate Annabelle's understanding of what Alexa can 'hear', which articulates to herself 'off the record', without using a wake word that might occasion a response. On line 16, Annabelle's cry of frustration "Oh for goodness' sake" overlaps with Alexa's apology-prefaced account that she can't find the song in lines 14-15. Annabelle then begins an insult directed to Alexa: "don't be so path-", presumably cutting off at the word 'pathetic'. As her frustration mounts, following Alexa playing several, apparently incorrect songs, Annabelle 'upgrades' her insult to: "you're useless (3.2) utterly useless" while staring towards Alexa, though without having used the 'wake word', so again, adopting a footing that positions this as a form of frustrated 'self-talk'. Anabelle then performs several more shifts of footing, first using the wake word to command Alexa to "STOP" (in line 38), then shifting back to self-talk with a further upgraded insult. While delivering this insult, she stares towards the Echo device, thrusting her head forward with each word, calling Alexa a "St↑upid st↑upid woman.hh" in line 40. We similarly see this kind of anthropomorphized insult in Extract 11, where Annabelle's self-talk about and to Alexa explicitly topicalizes the issue of blaming or insulting a machine.

Throughout Extract 11 Annabelle is leant over the arm of the sofa, turning away from the Echo device, looking for her sewing box.

Extract 11.

Open in a separate window

Annabelle begins by musing on her prior actions of blaming Alexa. On line 04 she says, "I shouldn't be so rude to her should I,", followed on lines 07-08 with "it's only what she's programmed as 'nd she's not programmed very well is she?". Both turns start with an initial position "I", followed by a tag question. Here, Annabelle does self-talk while speaking about and anthropomorphizing Alexa. Note that she uses the third person 'her' rather than the second person 'you', as might be the case with a co-present other, nor does she use the wake word in a way that might solicit a response.

Then on line 10, we see a footing shift in Annabelle's talk. She shifts from talking about Alexa to talking to Alexa as a co-present other, though ostensibly not in pursuit of a response. Instead, her insult is addressed to "Alexa you silly old woman', but instead of doing 'Alexa' as a wake word, she says Alexa quietly, in a croaky voice, turned away from the wake light, not pausing, nor issuing a request, nor pursuing a response.

On line 12, we see several more footing shifts – first to self-talk about Alexa "'snt sound too old >does she

Throughout these blames and complaints, Annabelle uses gendered terms such as "woman", "her" and "she", treating Alexa as a blameable other. Alongside this recognizably 'companioned practice', Annabelle also does various forms of moralizing self-talk. By not invoking Alexa with the wake word, insults to or talk about Alexa can be done 'off the record' as 'just' self-talk. Annabelle topicalizes how one 'should' or 'shouldn't' talk to or about Alexa, so clearly treats this as a moral/interactional issue.

3.4.2 Complaints and insults to 'Alexa's face'

In Extract 12, we see Annabelle using the wake word to complain 'on the record', i.e., Alexa's face.

Extract 12.

Open in a separate window

On line 04, Annabelle initiates an SR3 sequence by summoning Alexa using the wake word, pausing for a second, then doing the request, "where are my glasses". However, she does not wait for a response. Instead, Annabelle increments her turn with "Alexa you don't know do you?" (line 05). This second use of the wake word is through-produced, without a pause after the 'Alexa', and is done while walking out of the room and without looking towards the wake light.

While Annabelle's use of the SR3 sequence here demonstrates her understanding of Alexa's technical methods of use, she also shows knowledge of the device's limitations. Her shifts in footing also switch between 'talk-to-Alexa' and 'talk-about-Alexa' within the same turn. In lines 04-05 she first says 'Alexa' as a wake word (with a pause between wake word and request, as usual within a 'standard' SR3 sequence in our data), then again in a dismissive complaint: "Alexa you don't know do you?". Finally, after Alexa responds on lines 07-08, matching Annabelle's expectations with an 'unhelpful' response, Annabelle 'upgrades' her complaint on line 09, overlapping Alexa's response with "oh don't be so shtupid?" shifting footing again to do an 'off the record' insult. Here, Annabelle explicitly acknowledges her understanding that Alexa will not be able to carry out her request while still asking anyway, then insults Alexa for giving an ill-fitted "stupid" response. Though Annabelle has shown her expectation that Alexa will not be able to help, she still treats the device as a blamable other. This extract provides us with a deviant case of the SR3 sequence. Annabelle, technically 'correctly', initiates the sequence with the wake word 'Alexa' followed by a request about her glasses. However, she then explicitly 'retracts' the request, demonstrating her understanding of Alexa's inability to fulfil it, and then responds to Alexa's anticipatedly inadequate response with an insult.

4. Discussion

4.1 Summary and analytic implications of our findings

Our analyses have demonstrated Annabelle's practical understanding of Alexa's capabilities as a device, which may, as we discuss below, extend to its use as a 'companion'. We first showed how Annabelle typically uses the VA, as intended, to help with day-to-day activities. We focused on the structure, timing, and embodiments of Anabelle's 'SR3 sequences', and on the turn design of Annabelle's requests to Alexa. Annabelle's routine pauses between the wake word and the request, often with a glance towards the 'wake light' after saying 'Alexa' as a summons, demonstrates her practical understanding of the SR3 sequence as a technical method of using the device. Her use and interpretation of the SR3 sequence as such is further demonstrated by our 'deviant case' (Extract 12), where she through-produces the wake word and request, but without then pursuing a response to the request (see also Albert & Hamann, 2021).

We also examined variations in the design of Annabelle's request in relation to conversation analytic concepts of contingency and entitlement (Curl & Drew, 2008). Whereas straightforward commands for Alexa to turn lights on and off, or to set a reminder (in Extracts 1, 2, and 3), were produced as 'high-entitlement/low-contingency' directives (Craven & Potter, 2010), Annabelle's more complex requests, such as for Alexa to play a specific song involved some conventional markers of contingency such as modal verbs ("can you play…" in extract 4). We also noted other regular features of Annabelle's request designs to Alexa such as her use of 'please' and 'thank you' that are often associated with marking 'politeness', in ways that we can at least, claim mark the initiation/completion of each action (Craven & Potter, 2010). While these behaviours may be technically 'unnecessary' when designing a request to a VA, their use is evidently relevant to Annabelle, and their broader relevance in request design suggests opportunities for new empirical questions for future research. For example, folk language ideologies about the use of 'please' and 'thank you' have led to moral panics about how their omission in talk to VAs is degrading normative standards of politeness (see BBC News, 2018; Stokoe, 2021). Without implying that Annabelle treats Alexa as a recipient in ways that involve human-like practical/relational contingencies, differences in the entitlement or contingency of these turn designs at least indicates varied degrees of confidence that her requests to Alexa will be fulfilled. These observations contribute to discourses about anthropomorphism within HCI. They provide a naturalistic, empirical perspective on whether and how talking to machines may reflect users' beliefs about a device's 'humanness', or may constitute 'mindless' spill-overs from human-human to human-machine interaction (Nass & Moon, 2000). As Alexa is controlled through a natural language interface, Annabelle may be using familiar interactional practices without believing the device to be human-like (or, indeed companion-like, see Lopatovska & Williams, 2018). Whatever Annabelle believes about Alexa's ability to process or respond to certain actions and turn designs (e.g., the use of politeness markers that are not necessarily recognised by the VA), we still see systematic variations in her use of such markers in different interactional footings (e.g., self-talk and talk-to-Alexa).

Our second analytic observation focused on how talk in the sensor range of a VA, but with no nominated recipient and without subsequent pursuit of response (Stivers & Rossano, 2010), could be analysed as a self-directed display of distress or helplessness. Our identification of this type of 'self-talk with a VA' builds on Goffman's (1978: 800) broader descriptions of "imprecations and extended self-remarks" done by "unaccompanied" persons. In Annabelle's self-talk with Alexa, she shifts footing from reporting the speech or intentions of others, to doing 'online commentary' of her own doings, often explicitly topicalising her feelings of distress and confusion. We can only speculate as to whether these instances help her to manage problems and anxieties, though we do note that they sometimes invoke Alexa as an ostensibly co-present party, and sometimes as an absent, 'blameable' other. We showed that Annabelle typically engaged in talking to herself (or at least, talking with no obvious recipient) when she needed help that the VA could not provide, or while commentating on her own ongoing activities. These occasions of self-talk bear comparison to forms of 'online commentary' (Heritage, 2017) and moments where people do talk and other forms of explicit praxeological marking of ostensibly 'internal' or perceptual actions (Coulter & Parsons, 1990). We showed how Annabelle produces cries for help, or initiates questions with no apparent recipient and no subsequent pursuit of a response where there was a problem for which she apparently needed reassurance or guidance. These utterances were designed in a way that ensured no involvement from Alexa i.e., by not using the wake word (Reeves et al., 2019), suggesting Annabelle's understanding that the VA cannot help with these problems. From Annabelle's use of this form of self-talk in the presence of the VA, we infer that she does not treat the VA as an 'active' agent, or perceive the VA as independently able to monitor and respond to her activities without being explicitly summoned through Annabelle's own initiative.

Our final analytic observation examined how Annabelle's complaints about or insults to Alexa treat the device as a blameable other, and are often coupled with anthropomorphic gendered reference terms such as "her", "she" and "stupid woman". However, while these actions are recognisable as companioned practices, by blaming Alexa without using a wake word summons, Annabelle's complaints remain 'off the record'. She invokes the device, but as a non-agentic or absent third party. Annabelle's variations in turn design in her talk to and around Alexa suggests that these are not 'mindless' spillovers from human-human interaction (Lopatovska & Williams, 2018). Rather, this range of turn designs and footing shifts reflect a fine degree of situated control about when and how Alexa is invoked as either companion-like or device-like.

Our last extract provides a deviant case where Annabelle complains to Alexa but this time she does use the wake word, first asking for help finding her glasses, then citing her understanding of Alexa's inability to offer help, then blaming Alexa for giving an inapposite response by saying "don't be so stupid". This sequence exemplifies shifts of footing where Anabelle switches from talk-to-Alexa to talk-about-Alexa, then to self-talk, explicitly topicalizing Alexa's inability to assist or offer advice with all tasks, such as searching for Annabelle's glasses. Though Alexa does not complete Annabelle's task here, the device does work as a blameable other, taking on blame that, elsewhere, we see Annabelle's directing towards herself. As Mavrina et al. (2022) argue, based on their longitudinal HCI study of communication breakdowns with VAs, the burden to fix such failures tends to lie with the user. By blaming Alexa, Annabelle works on issues of accountability that are recognizable from companioned interactional practices in a range of very different settings (e.g., Ekberg et al., 2020), though clearly, and as Annabelle herself says, Alexa does not provide her with the resourceful copresence of a human companion. In some situations, Annabelle treats Alexa as an active agent in as much as she designs her talk to Alexa in ways that hold the device accountable for interactional problems. The blameability of the VA is also highlighted by Albert et al. (2023) who show how, in multi-party 'smart homecare' interactions, a carer can shift blame for breakdowns of communication onto a VA in ways that alleviate the accountability of the care service user. Annabelle's blamings of Alexa show how companioned practices with VAs can be similarly useful, though negatively valanced, in that the burden of blame for troubles can be shifted from the user onto the device.

In our data, the ways Annabelle treats Alexa and the ongoing shifts in footing we observe transform both the situation and the interactional roles at hand. We agree with Porcheron et al. (2018) that Alexa is neither treated as a human conversationalist, nor as having 'personhood'. However, we still see uses of the VA that emulate aspects of companioned interactional practices. As Pradhan et al. (2019, p.17) suggest, "someone who is more in need of social contact may be more likely to personify a technology and at times seek social connection through it". Annabelle sometimes treats Alexa as companion-like by blaming or complaining about Alexa when problems arise. Sometimes, by contrast, she treats Alexa as a transparently helpful object-like device when requesting Alexa turn on a light or complete a task (Begum et al., 2013; Pradhan et al., 2019). Sometimes, Annabelle refers to Alexa as a 'she', and explicitly topicalises the moral, normative dimensions of their relationship (i.e., how Annabelle should or shouldn't talk Alexa). These ongoing shifts of footing reveal Annabelle's fluid conceptualisations of Alexa as they change, moment to moment, tracking current roles and practical involvements in recognisably companioned interactional practices.

Our findings build on CA studies within HCI that show how humans adapt interactional practices from human-human interaction in their talk with VAs (Pelikan et al., 2022). CA studies in this context often focus on how participants interact with a VA in laboratories or domestic, multi-party conversational environments (Fischer et al., 2019; Porcheron et al., 2018; Reeves et al., 2019). This paper shows how a person living with dementia, 'alone' with their VA, does more than 'just use' the device. By tracing shifts of footing, we demonstrate how the user constructs the device as a 'blameable other', and treats their interactions with their 'virtual companion' as morally accountable.

4.2 Implications for VAs as companions for older adults living with dementia

The episodes of interaction examined in this paper offer a distinctive perspective on how VAs can be used by older adults living with dementia, and for how VAs are conceptualised in this environment. Pradhan et al. (2019) analyse interviews, diary entries, and usage logs of older adults using smart speakers, showing how they alternately categorize VAs as either human-like or object-like, and describing this shift as a form of 'fluid ontology' in users' interpretations of VAs. We witness a similar fluidity in our participant's footing shifts as she treats Alexa alternately as an inanimate device, as a compliant interlocutor, or as a 'blameable' other. For example, Annabelle often treats Alexa as a device when using an orderly SR3 sequence to complete a simple task such as requesting information. This conventional footing exemplifies how VAs are conceptualised as straightforwardly accessible and useful for vulnerable older adults seeking to maintain their independence by doing or remembering things for which they might otherwise require support (Ramadan et al., 2021). In this interactional configuration, and with thanks to one of our reviewers for this choice phrase, Alexa is no more companion-like than a light switch. However, where Alexa is lambasted, blamed, or complained about, the anthropomorphic implications do suggest forms of interpersonal relationship that extend beyond the user/device paradigm. While these interactions do not equate to the practices of companionship documented in conversation analytic studies of e.g., family companions in visits to healthcare professionals, we recognize facets of the domestic moral and affective accountability that Goodwin and Cekaite (2018) describe as 'family choreography'. Without buying into the overblown marketing claims of 'virtual companion' devices, we suggest that future CA studies could focus on how different degrees and dimensions of 'companionship' are constructed through talk and embodied interaction around VAs.

4.3 Limitations and future directions

This study has been limited by its analytic focus on a single case study, and by exploring an environment in which talk within the sensor range of a VA only occasions a response when summoned by a wake word. This situation carries clear methodological limitations for this study, since CA depends on examining recipient responses to identify situated interpretations of participants actions themselves i.e., CA's 'next turn proof procedure' (van Burgsteden, 2023). More interactionally ambiguous situations where no response to any given utterance is necessarily warranted have been characterised analytically as "continuing states of incipient talk" (Schegloff & Sacks, 1973). In this interactional environment, CA's has less evidence for the analysis of initial actions because recipient responses-as-interpretation are not necessarily due. This is certainly the case when exploring talk that may be directed at self-or-Alexa, and where we focus on talk to Alexa that is not necessarily designed for a co-present other (Porcheron et al., 2018). Nonetheless, these methodological limitations, and the issue of whether responses from Alexa were due were helpfully topicalized by our participant (see Extract 12). The way Anabelle manages Alexa's availability as a agentic recipient by issuing or withholding the wake word speaks to our research question about whether and how a device can be treated as a companion.

5. Conclusion

The research reported here contributes a CA perspective to our understanding of the extent to which technologists' claims about 'virtual companionship' may be warranted. By taking a discursive psychological approach to the concept of companionship, along with a CA study of naturalistic video data, we have highlighted a range of 'companioned' interactional practices. These include a) footing shifts in talk involving a VA, b) self-talk within sensor range of a VA that is not designed (e.g., by using a wake word) for a response, and c) online commentary and displays of distress or helplessness. Without claiming that these constitute a form of 'companionship' that is comprable with human-human interaction, these practices nonetheless index the construction of 'companionship-likeness' with VAs. The participant in our study treats VA interchangeably as an inanimate device, or as a co-present other, depending on the activities currently underway. Notably, the companioned practices we observed mostly arose when problems occurred. The user treated the VA as companion-like mostly when complaining about the VA, attributing it blame for interactional trouble, or citing the VAs inabilities to respond to different situations. However, we also saw how shifts in footing while using (or complaining to/about) the device show that the user does not interpret or expect the VA to behave as a fully human-like companion.

By focusing on a user's shifting interactional orientations towards a VA, our analysis has shown how users can manipulate the embedded categories of speakership and recipiency in the design of their conversations with machines. These interactional methods, developed by a person living with dementia for using and relating to a VA in day-to-day tasks and activities, broaden our understanding of the range of practical uses and interpretations of VAs as assistive technologies. When a device is at its most companion-like: when it is blamed, insulted, or complained-about, this might inspire more critical and realistic claims, designs, and applications of 'virtual companionship'.

Acknowledgments

This research was supported by an ESRC PhD studentship funded through the Midlands Graduate School DTP [ES/P000711/1] and Loughborough University. It was also supported by the Alzheimer's Research UK Midlands Network Small Equipment Grant.

References

Aarsand, P. A., Aronsson, K. (2009). Response cries and other gaming moves-Building intersubjectivity in gaming. Journal of Pragmatics, 41(8), 1557-1575. https://doi.org/10.1016/j.pragma.2007.05.014

Alač, M., Gluzman, Y., Aflatoun, T., Bari, A., Jing, B., & Mozqueda, G. (2020). How everyday interactions with digital voice assistants resist a return to the individual. Evental Aesthetics, 9(1), 51.

Albert, S., Hamann, M. (2021, July). Putting wake words to bed: We speak wake words with systematically varied prosody, but CUIs don't listen. In Proceedings of the 3rd Conference on Conversational User Interfaces (pp. 1-5).

Albert, S., Hamann, M., Stokoe, E. (2023). Conversational User Interfaces in Smart Homecare Interactions: A Conversation Analytic Case Study. Proceedings of the 5th International Conference on Conversational User Interfaces, 1-12. https://doi.org/10.1145/3571884.3597140

Amazon Echo (Director). (2019). Amazon Echo Alexa-Morning Ritual (60s). https://www.youtube.com/watch?v=rHsO-rXrLLo

Amrhein, A., Cyra, K., Pitsch, K. (2016). Processes of reminding and requesting in supporting people with special needs. Human practices as basis for modeling a virtual assistant. Proceedings 1st ECAI Workshop on Ethics in the Design of Intelligent Agents. The Hague, The Netherlands (pp. 18-23).

Antaki, C. (Ed.). (2011). Applied conversation analysis: Intervention and change in institutional talk. Springer.

Antaki, C., Chinn, D. (2019). Companions' dilemma of intervention when they mediate between patients with intellectual disabilities and health staff. Patient Education and Counseling, 102(11), 2024-2030. https://doi.org/10.1016/j.pec.2019.05.020

Baldwin, C. (2005). Technology, dementia and ethics: rethinking the issues. Disability studies quarterly, 25(3). https://doi.org/10.18061/dsq.v25i3.583

Bauman, Z. (1973). On the philosophical status of ethnomethodology. The Sociological Review, 21(1), 5-23. https://doi.org/10.18061/dsq.v25i3.583

BBC News, (2018, April 25), Amazon Alexa to reward kids who say: 'Please'. BBC News https://www.bbc.com/news/technology-43897516

Begum, M., Wang, R., Huq, R., Mihailidis, A. (2013). Performance of daily activities by older adults with dementia: The role of an assistive robot. 2013 IEEE 13th International Conference on Rehabilitation Robotics (ICORR), 1-8. https://doi.org/10.1109/ICORR.2013.6650405

Berger, I., Viney, R., Rae, J. P. (2016). Do continuing states of incipient talk exist?. Journal of Pragmatics, 91, 29-44. https://doi.org/10.1016/j.pragma.2015.10.009

Blair, J., Abdullah, S. (2019). Understanding the Needs and Challenges of Using Conversational Agents for Deaf Older Adults. Companion Publication of the 2019 Conference on Computer Supported Cooperative Work and Social Computing, 161-165. https://doi.org/10.1145/3311957.3359487

Bosisio, F., Jones, L. (2023). Consensual and non-consensual asymmetry in talk between people with dementia and their companions: Presenter (s): Anca-Cristina Sterie, Lausanne University Hospital, Switzerland. Patient Education and Counseling, 109, 13. https://doi.org/10.1016/j.pec.2022.10.040

Broekens, J., Heerink, M., Rosendal, H. (2009). Assistive social robots in elderly care: A review. Gerontechnology, 8(2), 94-103.

Chen, S. C., Moyle, W., Jones, C., Petsky, H. (2020). A social robot intervention on depression, loneliness, and quality of life for Taiwanese older adults in long-term care. International Psychogeriatrics, 32(8), 981-991. https://doi.org/10.1017/S1041610220000459

Clayman, S. E. (1995). The dialectic of ethnomethodology. Semiotica, 107(1/2), 105-123.

Cooper, S., Di Fava, A., Vivas, C., Marchionni, L., Ferro, F. (2020). ARI: The social assistive robot and companion. 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 745-751. https://doi.org/10.1109/RO-MAN47096.2020.9223470

Coulter, J., Parsons, E. D. (1990). The praxiology of perception: Visual orientations and practical action. Inquiry, 33(3), 251-272. https://doi.org/10.1080/00201749008602223

Craven, A., Potter, J. (2010). Directives: Entitlement and contingency in action. Discourse Studies, 12(4), 419-442. https://doi.org/10.1177/1461445610370126

Curl, T. S., Drew, P. (2008). Contingency and action: A comparison of two forms of requesting. Research on Language and Social Interaction, 41(2), 129-153. https://doi.org/10.1080/08351810802028613

Demiris, G., Rantz, M., Aud, M., Marek, K., Tyrer, H., Skubic, M., Hussam, A. (2004). Older adults' attitudes towards and perceptions of 'smart home' technologies: A pilot study. Medical Informatics and the Internet in Medicine, 29(2), 87-94. https://doi.org/10.1080/14639230410001684387

Demiris, G., Thompson, H. J., Lazar, A., Lin, S.-Y. (2016). Evaluation of a digital companion for older adults with mild cognitive impairment. AMIA Annual Symposium Proceedings, 2016, 496. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333281/

Doehring, A. H. (2018). Three-party interactions between neurologists, patients and their companions in the seizure clinic [PhD Thesis, Loughborough University]. https://www.researchgate.net/profile/Ann-Doehring/publication/330798535_Three-party_interactions_between_neurologists_patients_and_their_companions_in_the_seizure_clinic/links/5c6ffc2c458515831f6675c2/Three-party-interactions-between-neurologists-patients-and-their-companions-in-the-seizure-clinic.pdf

Ekberg, K., Hickson, L., Land, C. (2020). Practices of negotiating responsibility for troubles in interaction involving people with hearing impairment. In Atypical Interaction: The Impact of Communicative Impairments within Everyday Talk. (pp. 409-433). Palgrave Macmilan. https://doi.org/10.1007/978-3-030-28799-3_14

Epley, N., Waytz, A., Cacioppo, J. T. (2007). On seeing human: a three-factor theory of anthropomorphism. Psychological Review, 114(4), 864. https://psycnet.apa.org/doi/10.1037/0033-295X.114.4.864

Evans, J., Brown, M., Coughlan, T., Lawson, G., Craven, M. P. (2015). A Systematic Review of Dementia Focused Assistive Technology. In M. Kurosu (Ed.), Human-Computer Interaction: Interaction Technologies, Vol. 9170 (pp. 406-417). Springer International Publishing. https://doi.org/10.1007/978-3-319-20916-6_38

Fischer, J. E., Reeves, S., Porcheron, M., Sikveland, R. O. (2019). Progressivity for voice interface design. Proceedings of the 1st International Conference on Conversational User Interfaces, 1-8. https://doi.org/10.1145/3342775.3342788

Fischer, K. (2021). Tracking anthropomorphizing behavior in human-robot interaction. ACM Transactions on Human-Robot Interaction, 11(1), 4:1-4:28. https://doi.org/10.1145/3442677

Garfinkel, H., Wieder, D. L. (1992). Two incommensurable, asymmetrically alternate technologies of social analysis. In G. Watson R. M. Seiler (Eds.), Text in context: Contributions to ethnomethodology (pp. 175-206). Sage.

Goffman, E. (1967). Interaction ritual: Essays in face to face behavior. Pantheon.

Goffman, E. (1978). Response cries. Language, 787-815. https://doi.org/10.2307/413235

Goffman, E. (1981). Forms of talk. University of Pennsylvania Press.

Goodwin, C. (2007). Interactive footing. Reporting Talk. Studies in interactional sociolinguistics, 24, 16.

Goodwin, M. H., Cekaite, A. (2018). family choreography: Practices of control, care, and mundane creativity. Routledge.

Graham, V., Tuffin, K. (2004). Retirement villages: Companionship, privacy and security. Australasian Journal on Ageing, 23(4), 184-188. https://doi.org/10.1111/j.1741-6612.2004.00047.x

Heath, C., vom Lehn, D. (2013). Interactivity and Collaboration: New forms of participation in museums, galleries and science centres. In Museums in a digital age (pp. 266-280). Routledge.

Heath, C., Hindmarsh, J., Luff, P. (2010). Video in qualitative research: Analysing social interaction in everyday life. Sage Publications.

Hepburn, A., Bolden, G. (2017). Transcribing for social research. SAGE Publications Ltd. https://www.doi.org/10.4135/9781473920460

Heritage, J. (1984). A change-of-state token and aspects of its sequential placement. In J. M. Atkinson J. Heritage (Eds.), Structures of social action (pp.299-345). Cambridge University Press.

Heritage, J. (2017). Online commentary in primary care and emergency room settings. Acute Medicine Surgery, 4(1), 12-18. https://doi.org/10.1002/ams2.229

Heritage, J., Drew, P. (1992). Talk at work: Interaction in institutional settings. Cambridge University Press.

Hill, R., Betts, L. R., Gardner, S. E. (2015). Older adults' experiences and perceptions of digital technology:(Dis) empowerment, wellbeing, and inclusion. Computers in Human Behavior, 48, 415-423. https://doi.org/10.1016/j.chb.2015.01.062

Hoey, E. M. (2017). Lapse organization in interaction [PhD Thesis, Max Planck Institute for Psycholinguistics, Radbound University, Nijmegen]. http://bit.ly/hoey2017

Hofstetter, E. (2020). Nonlexical "moans": Response cries in board game interactions. Research on Language and Social Interaction, 53(1), 42-65. https://doi.org/10.1080/08351813.2020.1712964

Höhn, S. (2019). Artificial companion for second language conversation. Springer International Publishing.

Huma, B., Alexander, M., Stokoe, E., Tileaga, C. (2020). Introduction to special issue on discursive psychology. Qualitative Research in Psychology, 17(3), 313-335. https://doi.org/10.1080/14780887.2020.1729910

Ienca, M., Fabrice, J., Elger, B., Caon, M., Pappagalloe, A., Kressig, R., Wangmo, T. (2017). Intelligent assistive technology for Alzheimer's disease and other dementias: A systematic review. Journal of Alzheimer's Disease, 56(4), 1301-1340. https://doi.org/10.3233/ JAD-161037

Jefferson, G. (1988). Preliminary notes on a possible metric which provides for a 'standard maximum' silence of approximately one second in conversation. In D. Roger P. Bull (Eds.), Conversation: An interdisciplinary perspective. (pp. 166-196). Multilingual Matters.

Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. Lerner, (Ed.), Conversation analysis: Studies from the first generation. (pp. 13- 31). John Benjamins.

Jung, J., Murray-Rust, D. S., Gadiraju, U., Bozzon, A. (2023). Gender Choices of Conversational Agent: How Today's Practice Can Shape Tomorrow's Values. 2022 CHI Conference on Human Factors in Computing Systems, CHI 2022. https://doi.org/10.1145/3334480

Kramer, M., Yaghoubzadeh, R., Kopp, S., Pitsch, K. (2013). A conversational virtual human as autonomous assistant for elderly and cognitively impaired users? Social acceptability and design considerations. https://dl.gi.de/items/5fc52dbc-c38e-48c2-b2d2-74a349fbdce0

Krummheuer, A. L., Rehm, M., Rodil, K. (2020). Triadic Human-Robot Interaction. Distributed Agency and Memory in Robot Assisted Interactions. Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 317-319. https://doi.org/10.1145/3371382.3378269

Lazar, A., Thompson, H. J., Piper, A. M., Demiris, G. (2016). Rethinking the Design of Robotic Pets for Older Adults. Proceedings of the 2016 ACM Conference on Designing Interactive Systems, 1034-1046. https://doi.org/10.1145/2901790.2901811

Lee, J.-E. R., Nass, C. I. (2010). Trust in computers: The computers-are-social-actors (CASA) paradigm and trustworthiness perception in human-computer communication. In Trust and technology in a ubiquitous modern environment: Theoretical and methodological perspectives (pp. 1-15). IGI Global. https://www.igi-global.com/chapter/trust-computers-computers-social-actors/42897

Lopatovska, I., Williams, H. (2018). Personification of the Amazon Alexa: BFF or a Mindless Companion. Proceedings of the 2018 Conference on Human Information Interaction Retrieval - CHIIR '18, 265-268. https://doi.org/10.1145/3176349.3176868

MacMartin, C., Coe, J. B., Adams, C. L. (2014). Treating distressed animals as participants: I know responses in veterinarians' pet-directed talk. Research on Language and Social Interaction, 47(2), 151-174. https://doi.org/10.1080/08351813.2014.900219

Mavrina, L., Szczuka, J., Strathmann, C., Bohnenkamp, L. M., Krämer, N., Kopp, S. (2022). "Alexa, You're Really Stupid": A longitudinal field study on communication breakdowns between family members and a voice assistant. Frontiers in Computer Science, 4. https://doi.org/10.3389/fcomp.2022.791704

McTear, M., Callejas, Z., Griol, D. (2016). The conversational interface. Springer International Publishing. https://doi.org/10.1007/978-3-319-32967-3

Meiland, F., Innes, A., Mountain, G., Robinson, L., van der Roest, H., García-Casal, J. A., ... Franco-Martin, M. (2017). Technologies to support community-dwelling persons with dementia: a position paper on issues regarding development, usability, effectiveness and cost-effectiveness, deployment, and ethics. JMIR rehabilitation and assistive technologies, 4(1). https://doi.org/10.2196/rehab.6376

Mondada, L. (2008). Using video for a sequential and multimodal analysis of social interaction: Videotaping institutional telephone calls. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 9(3). https://doi.org/10.17169/fqs-9.3.1161

Mondada, L. (2018). Multiple temporalities of language and body in interaction: Challenges for transcribing multimodality. Research on Language and Social Interaction, 51(1), 85-106. https://doi.org/10.1080/08351813.2018.1413878

Mordoch, E., Osterreicher, A., Guse, L., Roger, K., Thompson, G. (2013). Use of social commitment robots in the care of elderly people with dementia: A literature review. Maturitas, 74(1), 14-20. https://doi.org/10.1016/j.maturitas.2012.10.015

Nass, C., Moon, Y. (2000). Machines and mindlessness: Social responses to computers. Journal of Social Issues, 56(1), 81-103. https://doi.org/10.1111/0022-4537.00153

O'Brien, K., Liggett, A., Ramirez-Zohfeld, V., Sunkara, P., Lindquist, L. A. (2020). Voice- controlled intelligent personal assistants to support aging in place. American Geriatrics Society, 68(1), 176-179. https://doi.org/10.1111/jgs.16217

Okada, Y. (2010). Role-play in oral proficiency interviews: Interactive footing and interactional competencies. Journal of Pragmatics, 42(6), 1647-1668. https://doi.org/10.1016/j.pragma.2009.11.002

Park, S., Kim, B. (2022). The impact of everyday AI-based smart speaker use on the well-being of older adults living alone. Technology in Society, 71. https://doi.org/10.1016/j.techsoc.2022.102133

Pehkonen, S. (2020). Response cries inviting an alignment: Finnish huh huh. Research on Language and Social Interaction, 53(1), 19-41. https://doi.org/10.1080/08351813.2020.1712965

Pelikan, H. R. M., Broth, M. (2016). Why that Nao? Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI '16. https://doi.org/10.1145/2858036.2858478

Pelikan, H. R. M., Broth, M., Keevallik, L. (2020). 'Are You Sad, Cozmo?': How humans make sense of a home robot's emotion displays. Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 461-470. https://doi.org/10.1145/3319502.3374814

Pelikan, H., Hofstetter, E. (2023). Managing delays in human-robot interaction. ACM Transactions on Computer-Human Interaction, 30(4), 1-42. https://doi.org/10.1145/3569890

Pelikan, H., Broth, M., Keevallik, L. (2022). When a robot comes to life: The interactional achievement of agency as a transient phenomenon. Social Interaction. Video-Based Studies of Human Sociality, 5(3), Article 3. https://doi.org/10.7146/si.v5i3.129915

Percy Campbell, J. (2023). Aging in place with Google and Amazon smart speakers: Privacy and surveillance implications for older adults [PhD Thesis]. http://dspace.library.uvic.ca/handle/1828/15095

Pino, M., Land, V. (2022). How companions speak on patients' behalf without undermining their autonomy: Findings from a conversation analytic study of palliative care consultations. Sociology of Health Illness, 44(2), 395-415. https://doi.org/10.1111/1467-9566.13427

Pino, M., Doehring, A., Parry, R. (2021). Practitioners' dilemmas and strategies in decision-making conversations where patients and companions take divergent positions on a healthcare measure: An observational study using conversation analysis. Health Communication, 36(14), 2010-2021. https://doi.org/10.1080/10410236.2020.1813952

Pomerantz, A. (1984). Pursuing a response. In J. M. Atkinson J. Heritage (Eds.), Structures of social action: Studies in conversation analysis (pp. 152-163). Cambridge University Press.

Porcheron, M., Fischer, J. E., Reeves, S. (2021). Pulling back the curtain on the Wizards of Oz. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW3), 1-22. https://doi.org/10.1145/3432942

Porcheron, M., Fischer, J. E., Reeves, S., Sharples, S. (2018). Voice interfaces in everyday life. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1-12. https://doi.org/10.1145/3173574.3174214

Potter, J. (2012). Discourse analysis and discursive psychology. In H. Cooper (Eds), APA Handbook of Research Methods in Psychology: Vol. 2. Quantitative, Qualitative, Neuropsychological, and Biological (pp. 111-130). American Psychological Association Press.

Pradhan, A., Findlater, L., Lazar, A. (2019). 'Phantom Friend' or 'Just a Box with Information': Personification and ontological categorization of smart speaker-based voice assistants by older adults. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1-21. https://doi.org/10.1145/3359316

Pradhan, A., Lazar, A., Findlater, L. (2020). Use of intelligent voice assistants by older adults with low technology use. ACM Transactions on Computer-Human Interaction, 27(4), 1-27. https://doi.org/10.1145/3373759

Pradhan, A., Mehta, K., Findlater, L. (2018). 'Accessibility came by accident': Use of voice-controlled intelligent personal assistants by people with disabilities. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1-13. https://doi.org/10.1145/3173574.3174033

Ramadan, Z., F. Farah, M., El Essrawi, L. (2021). From Amazon.com to Amazon.love: How Alexa is redefining companionship and interdependence for people with special needs. Psychology Marketing, 38(4), 596-609. https://doi.org/10.1002/mar.21441

Reeves, S., Fischer, J. E., Porcheron, M., Sikveland, R. (2019). Learning how to talk: Co-producing action with and around voice agents. https://dl.gi.de/items/f047d3d7-6534-47de-809e-3a3aaf781fb9

Riddoch, K. A. (2021). Human-Robot companionship: A mixed-methods investigation [PhD Thesis, University of Glasgow]. https://theses.gla.ac.uk/82858/

Ring, L., Barry, B., Totzke, K., Bickmore, T. (2013). Addressing loneliness and isolation in older adults: Proactive affective agents provide better support. Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 61-66. https://doi.org/10.1109/ACII.2013.17

Robson, C., Drew, P., Reuber, M. (2016). The role of companions in outpatient seizure clinic interactions: A pilot study. Epilepsy Behavior, 60, 86-93. https://doi.org/10.1016/j.yebeh.2016.04.010

Roettgers, J. (2019). How Alexa got her personality. Retrieved July 28, 2020, from https://variety.com/2019/digital/news/alexa-personality-amazon-echo-1203236019/

Šabanović, S., Bennett, C. C., Chang, W.-L., Huber, L. (2013). PARO robot affects diverse interaction modalities in group sensory therapy for older adults with dementia. 2013 IEEE 13th International Conference on Rehabilitation Robotics (ICORR), 1-6. https://ieeexplore.ieee.org/abstract/document/6650427/

Schegloff, E. A., Sacks, H. (1973). Opening up Closings. Semiotica, 8(4). https://doi.org/10.1515/semi.1973.8.4.289

Schwär, H., Moynihan, R. (2018). There's a clever psychological reason why: Amazon gave Alexa a female voice. https://www.businessinsider.com/ther es-psychological-reason-why-amazon-gave-alexa-a-female-voice-2018-9

Shead, S. (2017). REPORT: 1 in 4 people have fantasised about Alexa, Siri, and other AI assistants. https://www.businessinsider.com/jwt-speak-easy-study-people-fantasised-about-alexa-2017-4

Stivers, T., Rossano, F. (2010). Mobilizing response. Research on Language and Social Interaction, 43(1), 3-31. https://doi.org/10.1080/08351810903471258

Stokoe, E. (2021).Conversations, and how we end them. Nature Publishing Group UK London.

Stokoe, E., Sikveland, R. O., Albert, S., Hamann, M., Housley, W. (2020). Can humans simulate talking like other humans? Comparing simulated clients to real customers in service inquiries. Discourse Studies, 22(1), 87-109. https://doi.org/10.1177/1461445619887537

Stommel, W. J. P., Stommel, M. W. J. (2021). Participation of companions in video-mediated medical consultations: A microanalysis. In J. Meredith, D. Giles, W. Stommel (Eds.), Analysing Digital Interaction (pp. 177-203). Springer International Publishing. https://doi.org/10.1007/978-3-030-64922-7_9

Sutton, S. J. (2020). Gender ambiguous, not genderless: Designing gender in voice user interfaces (VUIs) with sensitivity. Proceedings of the 2nd Conference on Conversational User Interfaces, 1-8. https://doi.org/10.1145/3405755.3406123

Tong, Y., Wang, F., Wang, W. (2022). Fairies in the box: Children's perception and interaction towards voice assistants. Human Behavior and Emerging Technologies, 2022. https://doi.org/10.1155/2022/1273814

Tuncer, S., Licoppe, C., Luff, P., Heath, C. (2023). Recipient design in human-robot interaction: The emergent assessment of a robot's competence. AI SOCIETY. https://doi.org/10.1007/s00146-022-01608-7

Turk, V. (2016). Home invasion. New Scientist, 232(3104), 16-17. https://doi.org/10.1016/S0262-4079(16)32318-1

Turner-Lee, N. (2019). Can emerging technologies buffer the cost of in-home care in rural America?. Generations, 43(2), 88-93.

van Burgsteden, L. (2023). Next-turn proof procedure. In A. Gubina, E. M. Hoey C.W. Raymond (Eds.), Encyclopedia of Terminology for Conversation Analysis and Interactional Linguistics. International Society for Conversation Analysis (ISCA). https://doi.org/10.17605/OSF.IO/SY53W

Weinstein, J. N. (2019). Artificial intelligence: Have you met your new friends; Siri, Cortona, Alexa, Dot, Spot, and Puck. Spine, 44(1), 1-4. https://doi.org/10.1097/BRS.0000000000002913

Woods, H. S. (2018). Asking more of Siri and Alexa: Feminine persona in service of surveillance capitalism. Critical Studies in Media Communication, 35(4), 334-349. https://doi.org/10.1080/15295036.2018.1488082

Wooffitt, R. (2005). Conversation analysis and discourse analysis: A comparative and critical introduction. Sage.

Wright, J. (2021). The alexafication of adult social care: virtual assistants and the changing role of local government in England. International Journal of Environmental Research and Public Health, 18(2), 812. https://doi.org/10.3390/ijerph18020812

Wu, Y., Porcheron, M., Doyle, P., Edwards, J., Rough, D., Cooney, O., Bleakley, A., Clark, L., Cowan, B. (2022). Comparing command construction in native and non-native speaker IPA interaction through conversation analysis. Proceedings of the 4th Conference on Conversational User Interfaces, 1-12. https://doi.org/10.1145/3543829.3543839

Yaghoubzadeh, R., Kramer, M., Pitsch, K., Kopp, S. (2013). Virtual agents as daily assistants for elderly or cognitively impaired people. In R. Aylett, B. Krenn, C. Pelachaud, H. Shimodaira (Eds.), Intelligent Virtual Agents, Vol. 8108, (pp. 79-91). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-40415-3_7

Zhu, J., Shi, K., Yang, C., Niu, Y., Zeng, Y., Zhang, N., ... Chu, C. H. (2022). Ethical issues of smart home‐based elderly care: A scoping review. Journal of Nursing Management, 30(8), 3686-3699. https://doi.org/10.1111/jonm.13521