’Looking Through a Soda Straw’: Mediated Vision in Remote Warfare

Image-guided military operations embed soldiers into a complex system of image production, transmission, and perception. These soldiers separate their bodies from the battlefield, but they also mediate between them. In particular, remote controlled operations of so-called unmanned aerial systems (UAS) require the synchronization between human actors and technical sensors in real-time, such as the knowledge of a situation. This situational awareness relies almost exclusively on the visualization of sensory data. This human-machine entanglement corresponds to a new operative modality of images which differs from previous forms of real-time imaging such as live broadcasting, as it is based on a feedback-loop that turns the observer into an actor. Images are not simply analyzed and interpreted but become agents in a socio- technological assemblage. The paper will draw upon this functional shift of images from a medium of visualization towards a medium that guides operative processes. Based on the analysis of vision, architecture, and navigation in remote warfare, it will discuss how real-time video technology and the mobilization of sensor and transmission technology produce a type of intervention, in which action and perception is increasingly organized and determined by machines.

battlefield to the experience of "looking through a soda straw" (Cullen 2011, 122). Policymakers and journalists in particular, but also military personnel often oversee this disparity between a human or subject-centered perspective and a machine or computergenerated visualization. Military training units frequently rely on an idea of vision, in which observation and representation, human vision and technical visualization are merged. As part of their training program, operators learn to strategically eliminate the separation of humans and machines in order to adequately navigate an aircraft as well as to control a sensor system. This symbiosis of operators and technology requires a practice of remote vision, in which seeing is detached from the human sensorium. This entanglement has, as I will argue in this article, a significant impact on the way military drone operations are conducted. The merging between humans and machines requires a specific applied knowledge of remote action and perception, in which the body becomes part of an increasingly complex socio-technological assemblage. Further, the blurring between human and technology raises the issue of how and to what extent vision is bound to the human factor in warfare.

Dissociation of Vision and Visualization
While imaging techniques produce visibility beyond the physiological and anthropological boundaries of vision, the ability of seeing has historically been linked to the presence of a human eye. The Civitates Orbis Terrarum engravings, a collection of early modern cityscapes first published in 1572, represent an extraordinary example of early urban cartography and, more importantly, a beginning split between vision and visualization. Most of the engravings were made by Franz Hogenberg (1535-1590. They were produced and annotated in collaboration with the theologian Georg Braun . Most color plates in the collection depict a city and a surrounding landscape from above. Notably, many plates feature human figures at the bottom of the picture whose gaze turns away from the scenery towards the eye of the beholder. It appears as if these figures are standing on a plateau; one can see a path leading from the city up to the mountaintop. The image below shows the flat terrain surrounding a Roman fortress that is today the city of Utrecht. The landscape is presented from a bird's eye perspective while the figures at the bottom side are depicted in a horizontal view. Thus, their position drifts apart from the depiction of the city and the surrounding environment. called an Al-Qaeda terrorist while limiting damage to the tissue around it, that makes this counterterrorism tool so essential" (Brennan 2012). See also Friedersdorf 2012. Why did Hogenberg decide to maintain the presence of those human observers whose perspective contradicts with the flat topography? It seems as if Hogenberg tried to legitimize the tilt towards a cartographic depiction of the cityscape with the presence of the observers within the picture that remain a legitimate part of the picture while sharing their vantage point with the beholder. This divergence of perspectives features many of Hogenberg's plates in Civitates Orbis Terrarum. It suggests that Hogenberg considered vision as condition for visualization: The image seems to be only legitimized through the presence of a human eye -the bird's eye view apparently needs to correspond to the view someone has taken from somewhere. The 'unnatural' perspective needs to be based on an embodied vision, in this case exemplified by two observers in the foreground of the plate. This optical set-up suggests that body and image are still conceived as an indistinguishable unit. Nevertheless, the instability of spatial relations in Hogenberg's work confronts the viewer with a split between vision and visualization. Though the image attempts to base its perspective on the unity of body and image, the diverging perspectives foreshadows the rift between the image and the point of view. This rift becomes increasingly controversial with the rise of visual media, such as photography, film or video. These imaging techniques do not only produce visibility beyond the ability and scale of human vision but they also do no longer depend on the presence of a human observer. Think, for instance, of situations in which bodily presence is impossible, such as NASA's Mars exploration or of situations in which the possibilities of operation are spatially restricted such as in computer assisted surgery.
These imaging technologies have challenged the claim of a subject-centered aesthetics in the aftermath of Sigmund Freud's and Marshall McLuhan's prosthetics and extension theory that were famously called into question by Friedrich Kittler who argued that humans are not the subjects of media (Kittler 2002). Kittler's concept of technological mediation follows yet another notion of vision, in which a technological 'eye' continues to see beyond the anthropological limitation of the human eye. To show how imaging technology transforms the role of the observer in remotely controlled weapon systems, I propose to conceptualize the relation between body and image neither as a subjectcentered nor as a technology-centered view but as an entanglement of human vision and technological visualization. To demonstrate this form of entanglement, I will examine how image production in remotely controlled aircrafts prompt a drastic change in military intervention. In doing so, I wish to challenge the belief that images produced by remote sensing platforms yield a transparent battle space, in which the human is in total control.

The Weaponized Eye
The fusion of body and image breaks with the modern mobilization of image production, processing, and transmission. The presence of a human eye is no longer crucial to the practice of visualization. In Ridley Scott's Body of Lies (2008), Leonardo Di Caprio plays CIA agent Roger Ferris who must stop a terrorist group from operating in Jordan and Iraq. When Ferris offers himself to the terrorists, the apparent superiority of high-tech observation breaks down, although Jordan's barren desert landscape seems absolutely exposed to aerial surveillance. Surprisingly, Ferris is loaded onto a vehicle beyond the sensor's visual capability, namely a drone transmitting real-time visualizations to the CIA situation room; the vehicles start circling around Ferris, and thus they unsettle a dust cloud that blocks all view. As they break off in different directions, the drone can only follow one of the transport units. As in Hogenberg's cityscapes, this film scene also suggests that our perception is mediated through images, the observer and the perceptual space drift apart. The downwards gazing observers in the cityscapes negate their pictorial arrangement in a paradox way. Ferris and the terrorists in Body of Lies raise their eyes to the sky. In doing so, they mark out a perceptual space that looms from above. While the cityscape makes the situation apparent, the video image offers no trace of the observer. The visual architecture of remote warfare in Body of Lies detaches from the operator's perspective. This contradicts the traditional paradigm of the observer as a subject of representation and the image as the object of perception that has been the starting point for many theories of technical imaging, such as Descartes (1637) and Kepler (1611).
New imaging techniques give rise to new mediums and practices of visualization. They seek to reach further beyond the un-seeable. As such, they represent forms of visual experience that can no longer be scaled by the senses alone. Various parameters of imaging -motion, radiation, and magnification -had been the subjects of artistic and scientific experimentation whereas Soviet filmmaker Dziga Vertov envisioned already in the 1920s a new bond between observer and imaging devices. In his 1923 manifesto Kinoglaz (that translates into 'cinematic eye'), Vertov spoke of his intention to initiate a cinematic perception of the world. For Vertov, who was fascinated by Futurism's engagement with technology, film was the medium to free the people from the imperfections of human vision. Vertov noted in his diary (1923): "Kinoglaz lives and moves in space and time, takes impressions and fixes them quite differently than the human eye. The condition of our body during observation and the number of moments how we perceive this or that phenomenon are in no ways compulsory for the camera" (Vertov 1923(Vertov /1973 In his diary, Vertov repeatedly switches into a machine perspective: "I am Kinoglaz. I am a mechanical eye. I, the machine, show you the world as only I can see it. Now and forever, I free myself from human immobility. I am in constant motion, I draw near objects and then back away, I crawl under them, I climb onto them, I move along with the muzzle of a galloping horse, I plunge full speed into the crowd, I outstrip running soldiers, I take off with planes, I soar and plunge together with plunging and soaring bodies." (Vertov 1923(Vertov /1973 While Brown and Hogenberg's cityscapes comply with a visual architecture that centers on the observer, Vertov's camera detaches from the body. In doing so, Vertov eliminates the linear relationship between vision and point of view. The human eye is intentionally replaced by a camera that moves around in an unconstrained fashion. This technological disposition marks the beginning of a media history, in which the camera serves as a technical eye (often known as the metaphor of the weaponized eye). This concept of the artificial eye has ever since been applied to merge the fields of vision and visualization. Cyborg vision and 'seeing' cameras permeate contemporary discourse in the field of cultural and media studies, often presupposing either a transition or a rupture from sense-based to technical forms of seeing (Dorrian & Pousin 2013, Vertesi 2015, Trogemann 2014, Geimer 2010, Verbeek 2007, Harris 2006. In Bilder aus Versehen (Unintentional Images), Geimer studies what he terms the 'optical unconsciousness' of early photography. Taking Julius Neubronner's aerial photographs as an example, Geimer argues that traditionally technical media have been measured by the senses (2010, 319-331). Neubronner patented pigeon photography in 1908, a technique that captures aerial photographs from a bird's-eye view by fitting a camera onto a pigeon. Geimer states that Neubronner's photographs are "not based on a gaze that someone or something in 1908 had directed into the depth of the landscape" (2010, 328). He concludes that the camera cannot possess a subjective view or gaze since the author of the photograph is not the one executing it. For this reason, image authorship is inconclusive: It cannot be traced back to a specific observer (2010, 327). Geimer's reasoning applies equally to Vertov's filming technique, which still kept image production and image reception separate from each other in regard to time and spacethe recording of the film, developing and projection were seen as separate work steps.
Accordingly, Vertov's concept of cinematic perception does not constitute a disruption of vision since it points to the moment of capturing an image rather than to the relation of image and spectator in the cinema. Although Vertov's cinematic eye manifesto fundamentally challenges the relation between seeing and recording, it is by far more of a novel way to show than a novel way to see. The same applies to military surveillance with its new forms of visualization, which was established by techniques of aerial photography dating as far back as World War I. They can hardly be described as 'machines of vision', as Christoph Asendorf had proposed (2006,31). The combination of camera technology and vertical perspective in aircraft reconnaissance certainly allowed for new perspectives, but it relied on the subsequent development and interpretation of images (Andreas 2015).

Situational Awareness
Imaging in film and photography generally required tools and techniques that inhibit the simultaneous production and presentation of images. This separation has been suspended by contemporary transmission, sensor, and display technologies. With these technologies image production, processing, and transmission are possible in real-time and thus images are increasingly integrated into visual practice. This type of real-time image processing and transmission has blurred the distinction between vision and visualization as well as between observation and representation. It has set the scene for a synchronization of seeing and acting, in which humans and machines can remotely interact with one another. Real-time visualization has become the standard for a variety of image-guided practices in remote warfare (in particular in unmanned aerial systems). 3 Figure 4 presents an illustration featured in the manual of a training program for remotely piloted aircrafts conducted at Creech Air Force Base, Nevada (the United States Air Force flies most of its drone missions from Creech). The illustration depicts the Command and Control links for Unmanned Aerial Systems (UAS). Whereas traditional aviation requires pilots to act independently, the UAS comprise a complex network of human actors and technical sensors. It relies on the orchestration of relay stations, operators, ground troops, intelligence analysts, military lawyers, and imaging specialists (Gregory 2011, 195). One of its key features is the spatial divide between actors, both human and technical. It creates a disembodied space of operation, in which the reception and the transmission of digital data determines the interaction between multiple actors (Franz 2016). While information exchange between actors depends on communication with chat clients and radio transmission, the knowledge of elements in the environment -the so-called 'situational awareness' -relies almost exclusively on the visualization of sensor data. What might sound trivial is essential for understanding the nature of remotely operated aircrafts and their visual regimes: Images are no longer imprints of what is; they become a precondition for action. This implies a structurally different visual practice then the established methods for planning and surveillance. Note for instance Colin Powell's presentation of aerial footage as 'evidence' of Iraqi weapons of mass destruction that was used as a proof that called for intervention (Powell 2003). While image analysis has been crucial to reconnaissance, remote imaging and sensor mobility have ushered a new type of intervention, namely one in which operation is guided or misguided by what images show or critically fail to show. Coupling operation to images is thus not merely a question of what visual surveillance can reveal. What is at stake is rather that interaction with visual sensor networks structures action and perception. This calls for a closer examination of the human-machine interface that makes remotely controlled military technologies possible. The ground control station, shown in Figure 5, is the visual and operative interface between human actors and technical sensors in drone operations. It facilitates remote forms of access into the operating field. The photo shows the MQ-1 Predator and MQ-9 Reaper advanced cockpit ground control station. Though it has not been deployed in combat yet, the cockpit is designed to provide crews with a more immersive visual combat experience. According to its developers, the system was designed to integrate human operators, aiming for 'enhanced situational awareness' through 'human-centered display technology' (General Atomics, 2016). The pilot and the so-called 'sensor operator' sit in front of a panel of six 24-inch touch-sensitive monitors that are arranged in two rows. The top row monitors provide a wide-angled view of the operating field using a combination of live video footage, virtual terrain images, and air traffic data. A small video frame incorporated into the top middle monitor displays the live feed; it provides a rather restricted field of view. The top monitor ensemble displays a 3D topographical model in a 120-degree view as if the crew members were sitting in an actual cockpit. The bottom middle monitor allows top view access to the battlefield and offers a variety of visualization options (e.g. mission planning, map overlays). The left and right bottom monitors contain mission data, command and control options, aircraft data, as well as chat and email client interface.

Screen Operations
In order to adequately operate the aircraft from afar, crewmembers must integrate a complex system of visual information and sensor technology into their workflow. They must "visually discriminate and synthesize various images and complex data on several electronic screens while maintaining heightened vigilance to numerous sources of visual and auditory information necessary for sustaining situational and spatial awareness" (Chappelle, McDonald, McMillan 2011, 5). Visualization of the operating field combines virtual geo-data together with real-time sensor data provided by the Multi-Spectral Targeting System (MTS). This system is called the 'sensor ball', which is mounted onto the hull of the MQ-9 Reaper (manufactured by Raytheon). The MTS is controlled by the sensor operator. It comprises an infrared sensor, a light amplifier, a daylight camera, a laser designator, and a laser illuminator. Understanding the different modalities of visualization and observation is crucial for navigating the aircraft, and even more for combat decision-making. The ground control station represents a human-machine configuration, in which images do not only enhance military operation, but also prompt the crew to carry out actions in a specific way. From riflescopes to jet fighter helmets, this type of image-guided configuration is utilized in a variety of military applications. It challenges the relationship between eye and instrument as well as between organism and mechanism focusing on the situation instead of the result of image production. It shifts the attention from iconicity and visibility towards the interaction with imaging technology.
Image and media theory has finally turned its attention to the connection between image and operation, in particular when it relates to operative images and operative iconicity (Farocki 2004;Hinterwaldner 2013;Hoel & Lindseth 2014;Krämer 2009). Conceptualizing images in the context of operation, Aud Sissel Hoel and Frank Lindseth highlight three reasons why 'operational approaches' provide possibilities for rethinking images: "First, they offer dynamic approaches that analyze phenomena into doings and happenings rather than into things and static entities; second, they offer relational approaches that conceive identity in terms of open-ended processes of becoming; and third, by so doing, they allow us to ascribe agency to images, and crucially, to conceive agency as distributed across interconnected assemblages of people, practices, and mediating artifacts." (Hoel & Lindseth 2014, 2) Conceptualizing images in the context of operation hinges less on representation, manifestation, and aesthetics but rather emphasizes the practices through which images become media of control and instruction.
In a military context, the linking of image and operation does not only require crews to correctly analyze images. It is neither enough to be able to distinguish a suspect from a civilian nor to identify a weapon based on color differences of thermographic visualization. Wherever images negotiate between soldiers and the battlefield, the interplay of structures and processes, behind and in front of the screen, are crucial in order to understand how operators act through imaging technologies and how images enable or disable action and perception. However, despite increasing interest in drone operations (due also to the rising number of civilian casualties), this aspect of image operation is still largely overlooked. Image analysis and interpretation is rarely contextualized in relation to the question how imaging technology mediates the operator's views, as Gerrit Walczak argues in his study of the so-called 'collateral murder' video published by WikiLeaks in 2010 (Walczak 2012). The video captured the killing of Iraqi civilians by a US Air Force combat helicopter in Baghdad on July 12 2007. It reveals some aspects of what the helicopter crew saw, and it displays the communication via radio. However, the video does not offer enough information to understand the crew's decision-making process. Walczak shows that it is the technical character of visualization, such as the display architecture, the resolution, the scope, or the light exposure that had a significant impact on the crew's action (Walczak 2012, 12). The video itself does not communicate this information, and accordingly it does not allow for a comprehensive understanding of the crew's action and perception. Confronted with a variety of imaging techniques and modalities of visualization, the crew became embedded in a complex network of technological data that intervened in their workflow. As Walczak notes: "What a camera captures from above and who views its recording, when and in what way, is dependent on the peculiarity of the apparatus and its interaction with the aircraft and its crew, ever since Nadar took the first [aerial] photographs from a hot air balloon in 1858." (Walczak 2012, 10) Conducting operations through imaging technology does not only require an applied visual knowledge but brings up the question to what degree authorship, presence, or autonomy can be ascribed to this type of imaging technology.

Bio-convergence
In his PhD thesis "The MQ-9 Reaper Remotely Piloted Aircraft: Humans and Machines in Action" Timothy Cullen, a former US Air Force pilot, investigates how remotely piloted aircraft crews interact with imaging technology. In order to understand how the crew produces visibility with and through the sensor system Cullen particularly analyzed the training program of MQ-9 Reaper sensor operators (Cullen 2011, 117-201). Sensor operators act as intermediaries between sensors and visualizations. For instance, in selecting image modalities, sections, or magnification, they define the modes of seeing. In order to provide a stable visual field, operators in charge of the sensor ball maneuver its sensors carefully and irrespective of the aircraft position. This requires a constant synchronization of the sensors with the visual field. As Cullen's study demonstrates, this is achieved by strategically eliminating the gap between human vision and the sensor system. According to Cullen, instructors tell trainees to "become the camera" (2011,166). This idea of human-sensor fusion is repeated in the language both instructors and trainees employ: "Instructor sensor operators taught their students to visualize themselves being on the Reaper aircraft, floating above the ground and looking down at their quarry from the belly of the aircraft" (Cullen 2011, 166).
The ways how "experienced sensor operators interacted with the HUD [Heads Up Display]" led Cullen to identify what he terms a "feeling of remote presence" (Cullen 2011, 166 & 17). As he puts it: "After a couple hundred hours of flight experience and a sense of comfort with the modes, interfaces, and capabilities of the sensor ball, sensor operators began to feel like they were a part of the machine. With proficiency as a 'sensor', sensor operators found themselves shifting and straining their bodies in front of the HUD to look around an object. As pilots flew closer to a target, the transported operators tilted their heads in anticipation of the camera's movement" (Cullen 2011, 167).
Interaction with images and control panels not only establishes a cognitive relation between vision and visualization but also develops a physical relation deeply intertwined with the function of the aircraft: "Feelings of remote presence helped sensor operators move their bodies, and instructors believed that operators who felt as if they were 'flying the sensor' could hold their attention longer on a scene, were more curious of what they saw, could sense change and movement easier. A sensor operator's close relationship with the sensor ball helped them to do their jobs well. Experienced sensor operators who 'flew' the sensor ball from an 18-inch monitor became the machine. They became the eye in the sky." (Cullen 2011, 166) With regard to Cullen's study, a media-deterministic reading that solely assigns agency to technology does not serve as an explanation for the entanglement between the operator and the sensor system. The gap between observer and camera, which also characterized Vertov's texts, fundamentally questions the relation between the eye and the image. It proposes a notion of vision that surrenders the physical eye in favor of the technological eye. Nevertheless, a sheer media-deterministic approach would misleadingly imply that one has to negotiate the problem of perception beyond human vision by detaching it from the body. As Cullen argues, this does not seem to be the case with remotely controlled aircrafts. On the contrary, while Vertov celebrates the mobility of the camera as an emancipation of human vision, Cullen ties the Reaper sensor system back to physical experience. As mentioned above, the synchronization of the body and the sensor system is crucial to the production of visualization. However, also an exclusive anthropocentric approach that regards the apparatus as an extension of the human eye does not help to sufficiently grasp the essence of this kind of humanmachine nexus. As drone operators relinquish their subject-centric viewpoint, no trace of a central human presence is preserved. The visual architecture of ground control stations has established a context for action, in which the apparatus can hardly be described as an extension of the eye. The subject-centered representational space of classical aesthetics is abandoned in favor of a device-centric perspective.
Thus, real-time image-guided interventions in warfare suggest a new type of human-machine entanglement that goes beyond simplified subject-object relations. They embed soldiers into a complex system of image production, transmission and perception that separate their bodies from the battlefield and at the same time mediates between them. 4 This mediation significantly relies on the synchronization of human vision and technological visualization: The sensor balls cannot see, but operators claim 4 Frédéric Merget has pointed to the deconstruction of the concept of the battlefield in the context of remote warfare: "The emergence of technologies of targeted killings, including the use of unmanned drones, has had the effect of potentially bringing the battlefield to any location in the world in novel and radical ways that defy the traditional idea of the battlefield. Most targets of drone attacks will never know that they were targets and will be hit in a variety of locations (roads, homes, offices), which bear little relation to a battlefield, if only because there is less a battle than an instant flash annihilating the enemy, leaving no chance of flight or surrender." (Merget 2012, 17) to see through them transcending the anthropological bounds of visual experience. However, this effect only seems to take hold in cooperation with the machine. Phrases such as 'eye in the sky' or 'becoming the machine' suggest a seemingly continuous transition between vision and visualization as well as between technical and anthropological ways of 'seeing'. Lucy Suchman describes this synthesis as a "deadly bio-convergence at the boundaries of humans and machines" in which action and decision are embedded in assemblages of increasingly complex "sociotechnical mediation" (2015,19). In the context of military interventions, Suchman and Jutta Weber propose to rethink "conceptions of agency and autonomy, from attributes inherent in entities, to effects of discourses and material practices that variously conjoin and/or delineate differences between humans and machines" (Suchman & Weber 2014, 2).
The notion of remote vision can, then, be understood as a conjoining human machine configuration. UAS operators have to make decisions, often with deadly consequences, that are based on their ability to produce visibility dependent on the constrained views provided by the sensor system. As images deliver exclusive visual access to the battlefield, they become a precondition for action. They do not only show, but also prompt operators to see and carry out an action. Although the visual-motor coordination that is necessary to synchronize head and camera alignment requires, to some extent, a mitigation of human machine separation, it is imperative for UAS operators to understand and train how images become agents in an increasingly complex socio-technological assemblage. In this respect, it seems highly controversial that military training recommends to exercise the dissolving of the separation between sense and sensor since mediated vision in remote warfare constitutes nothing less than a fundamental intervention in the operator's workflow, their individual autonomy, and their decision-making processes. Accordingly, a critique of remotely piloted aircraft operations does not only need to document their devastating consequences. It must also investigate the fusion of body and apparatus to reveal the implications of technological innovation in warfare. Figures   Fig. 1 Trajectum (Utrecht). Georg Braun, Franz Hogenberg, 1593, Civitates Orbis Terrarvm, Köln, table 19. Fig. 2 Film still. The Body of Lies. Ridley Scott, USA 2008. Fig. 3 Patent drawings of Neubronner's pigeon camera with two lenses. Julius Neubronner, 1907, Patent GB190813128 (A) »Method of and Means for Taking Photographs of Landscapes from Above«, European Patent Office. Fig. 4 Command and Control Options of Theater Unmanned Aerial Systems. Creech Airforce Base Army Tactical Pocket Guide for Organic/Non Organic Group 3/4/5 UAS, 2010, p. 52. Fig. 5 Advanced Cockpit Ground Control Station. General Atomics Aeronautical Systems, Inc. 2015. Fig. 6 Multi-spectral Targeting System on a Reaper MQ-9 at Creech Airforce Base. Bryan William Jones, www.prometheus.med.utah.edu/~bwjones, (Creative Commons BY-NC 3.0).