6. Cognition | New Dark Age

6 Cognition

Here’s a story about how machines learn. If you are, say, the US Army, you want to be able to see things that the enemy has hidden. Perhaps they’ve got a load of tanks in a forest. The tanks are painted with confusing camouflage patterns, parked among and behind trees; they’re covered in brush. Patterns of light and shade, the weird green and brown splotches of paint: all of these conspire with thousands of years of evolution in the visual cortex to turn the blocky outlines of the tanks into rippling, shifting non-shapes, indistinguishable from the foliage. But what if there was another way of seeing? What if you could rapidly evolve a different kind of sight that perceived the forest and the tanks differently, so that what was hard to see suddenly sprung into view?

One way to go about this would be to train a machine to see the tanks. So you get a platoon of soldiers together, you get them to hide a bunch of tanks in the forest, and you take, say, a hundred photos of them. Then, you take another hundred photos of the empty forest. And you show fifty pictures from each set to a neural network, a piece of software that is designed to mimic a human brain. The neural network doesn’t know anything about tanks or forests, or light and shade; it just knows that these are fifty pictures with something important in them, and fifty pictures without that something, and it tries to spot the difference. It passes the photos through multiple layers of neurons, tweaking and judging them, but without any of the preconceptions embedded by evolution in the human brain. And, after a while, it learns to see tanks hidden in the forest.

Because you took a hundred photos originally, it’s possible to see if this really works. You take the other fifty photos of hidden tanks, and the other fifty photos of empty forest – which the machine has never seen before – and ask it to choose between them. And it does so, perfectly. Even if you can’t see the tanks, you know which photos are which, and the machine, without knowing, chooses the right ones. Boom! You’ve evolved a new way of seeing, and you send your machine off to the training ground to show it off.

And then disaster strikes. Out in the field, with a new set of tanks in the forest, the results are catastrophic. They’re random: the machine is about as good as spotting tanks as a coin toss. What happened?

The story goes that when the US Army tried this, they made a crucial error. All of the tank photos were taken in the morning, under clear blue skies. Then the tanks went away, and in the afternoon, when the photos of the empty forest were taken, it clouded over. The investigators realised that the machine worked perfectly, but what it had learned to distinguish was not the presence or absence of tanks, but whether it was sunny or not.

This cautionary tale, which has been told over and over again in the academic literature on machine learning,¹ is probably apocryphal, but it illustrates an important issue when dealing with artificial intelligence and machine learning: What can we know about what a machine knows? The story of the tanks encodes a fundamental realisation, and one of increasing importance: whatever artificial intelligence might come to be, it will be fundamentally different, and ultimately inscrutable, to us. Despite increasingly sophisticated systems of both computation and visualisation, we are no closer today to truly understanding exactly how machine learning does what it does; we can only adjudicate the results.

The original neural network, which probably engendered some early version of the tank story, was developed for the United States Office of Naval Research. It was called the Perceptron, and like many early computers it was a physical machine: a set of 400 light-detecting cells randomly connected, by a rat’s nest of wires, to switches that updated their response with every run – the neurons. Its designer, Cornell psychologist Frank Rosenblatt, was a great publicist for the possibilities of artificial intelligence. When the Perceptron Mark I was presented to the public in 1958, the New York Times reported,

The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existene. Later perceptrons will be able to recognise people and call out their names and instantly translate speech in one language to speech and writing in another language, it was predicted.²

Images

Image: Cornell University Library.

The Mark I Perceptron, an early pattern recognition system, at the Cornell Aeronautical Laboratory.

The idea that underlay the Perceptron was connectionism: the belief that intelligence was an emergent property of the connections between neurons, and that by imitating the winding pathways of the brain, machines might be induced to think. This idea was attacked by numerous researchers over the next decade, who held that intelligence was the product of the manipulation of symbols: essentially, some knowledge of the world was required to reason meaningfully about it. This debate between connectionists and symbolists was to define the artificial intelligence field for the next forty years, leading to numerous fallings out, and the notorious ‘AI winters’ in which no progress was made at all for many years. At heart, it was not merely a debate about what it means to be intelligent, but what is intelligible about intelligence.

One of the more surprising advocates of early connectionism was Friedrich Hayek, best known today as the father of neoliberalism. Forgotten for many years, but making a recent comeback among Austrian-inclined neuroscientists, Hayek wrote The Sensory Order: An Inquiry into the Foundations of Theoretical Psychology in 1952, based on ideas he’d formulated in the 1920s. In it, he outlines his belief in a fundamental separation between the sensory world of the mind and the ‘natural’, external world. The former is unknowable, unique to each individual, and thus the task of science – and economics – is to construct a model of the world that ignores the foibles of individual people.

It’s not hard to see a parallel between the neoliberal ordering of the world – where an impartial and dispassionate market directs the action independent of human biases – and Hayek’s commitment to a connectionist model of the brain. As later commentators have noted, in Hayek’s model of the mind, ‘knowledge is dispersed and distributed in the cerebral cortex much as it is in the marketplace among individuals’.³ Hayek’s argument for connectionism is an individualist, neoliberal one, and corresponds directly with his famous assertion in The Road to Serfdom (1944) that all forms of collectivism lead inexorably to totalitarianism.

Today, the connectionist model of artificial intelligence reigns supreme again, and its primary proponents are those who, like Hayek, believe that there is a natural order to the world that emerges spontaneously when human bias is absent in our knowledge production. Once again, we see the same claims being made about neural networks as were made by their cheerleaders in the 1950s – but this time, their claims are being put to work in the world more widely.

In the last decade, due to several important advances in the field, neural networks have undergone a massive renaissance, underpinning the current revolution in expectations of artificial intelligence. One of their greatest champions is Google, whose cofounder Sergey Brin has said of the progress in AI that ‘you should presume that someday, we will be able to make machines that can reason, think and do things better than we can.’⁴ Google’s chief executive, Sundar Pichai, is fond of saying that the Google of the future will be ‘AI first’.

Google has been investing in artificial intelligence for some time. Its in-house Google Brain project decloaked in 2011 to reveal that it had constructed a neural network from a cluster of a thousand machines containing some 16,000 processors, and fed it 10 million images culled from YouTube videos.⁵ The images were unlabelled, but the network evolved the ability to recognise human faces – and cats – with no prior knowledge about what these things might signify.

Image recognition is a typical first task for proving intelligent systems, and a relatively easy one for companies like Google, whose business combines building ever-larger networks of ever-faster processors with harvesting ever-greater volumes of data from the daily lives of its users. (Facebook, which operates a similar program, used 4 million pictures of its users to create a piece of software called DeepFace, which can recognise people with 98 per cent accuracy.⁶ Use of the software is illegal in Europe.) What happens next is that this software is used not merely to recognise, but to predict.

In a much-discussed paper published in 2016, two researchers from Shanghai Jiao Tong University, Xiaolin Wu and Xi Zhang, studied the ability of an automated system to make inferences about ‘criminality’, based on facial images. They trained a neural network on images of 1,126 ‘non-criminals’ culled from official Chinese ID photos found on the web, and another 730 ID photos of convicted criminals supplied by courts and police departments. After training, they claimed that the software could tell the difference between criminal and noncriminal faces.⁷

The result of the paper’s publication was uproar: technology blogs, international newspapers, and fellow academics weighed in. The most vocal critics accused Wu and Zhang of following in the footsteps of Cesare Lombroso and Francis Galton, notorious nineteenth-century proponents of criminal physiognomy. Lombroso founded the field of criminology, but his belief that the shape of the jaw, the slope of the forehead, the size of the eyes, and the structure of the ear could be used to determine a subject’s ‘primitive’ criminal characteristics was debunked in the early twentieth century. Galton developed a technique of composite portraiture whereby he hoped to derive a ‘typical’ criminal face – physical features that corresponded to an individual’s moral character. The attacks fed a narrative that facial recognition constituted a new form of digital phrenology, with all of the cultural biases that implied.

Wu and Zhang were appalled at the reaction, publishing an outraged response in May 2017. As well as refuting some of the more unscientific takedowns of their method, they took direct aim – in technological language – at their detractors: ‘There is really no need to parade infamous racists in chronic order with us inserted at the terminal node’⁸ – as though it was their critics who had manifested this lineage, rather than history itself.

Technology companies and others dabbling in AI are quick to retract their claims whenever they produce ethical conflicts, despite their own responsibility for inflating expectations. When the right-wing Daily Mail newspaper in the UK used the How-Old.net facial recognition programme to question the age of child refugees being admitted to Britain, its creator, Microsoft, was quick to stress that the software was just a ‘fun app’ that was ‘not intended to be used as a definitive assessment of age’.⁹ Likewise, Wu and Zhang protested, ‘Our work is only intended for pure academic discussions; how it has become a media consumption is a total surprise to us.’

One criticism came in for special consideration, highlighting a recurring trope in the history of facial recognition – one with underlying racial undertones. In their examples of average criminal and noncriminal faces, some critics detected ‘a hint of a smile’ on the noncriminals – a ‘micro-expression’ absent from the criminal images, indicating their strained circumstances. But Wu and Zhang denied this, not on technological grounds, but on cultural ones: ‘Our Chinese students and colleagues, even after being prompted to consider the cue of smile, fail to detect the same. Instead, they only find the faces in the bottom row appearing somewhat more relaxed than those in the top row. Perhaps, the different perceptions here are due to cultural differences.’¹⁰

What was left untouched in the original paper was the assumption that any such system could ever be free of encoded, embedded bias. At the outset of their study, the authors write,

Unlike a human examiner/judge, a computer vision algorithm or classifier has absolutely no subjective baggages, having no emotions, no biases whatsoever due to past experience, race, religion, political doctrine, gender, age, etc., no mental fatigue, no preconditioning of a bad sleep or meal. The automated inference on criminality eliminates the variable of meta-accuracy (the competence of the human judge/examiner) all together.¹¹

In their response, they double down on this assertion: ‘Like most technologies, machine learning is neutral.’ They insist that if machine learning ‘can be used to reinforce human biases in social computing problems, as some argued, then it can also be used to detect and correct human biases.’ Knowingly or not, such a response relies upon our ability to optimise not only our machines, but also ourselves.

Technology does not emerge from a vacuum. Rather, it is the reification of a particular set of beliefs and desires: the congruent, if unconscious dispositions of its creators. In any moment it is assembled from a toolkit of ideas and fantasies developed over generations, through evolution and culture, pedagogy and debate, endlessly entangled and enfolded. The very idea of criminality itself is a legacy of nineteenth-century moral philosophy, while the neural networks used to ‘infer it’ are, as we’ve seen, the product of a specific worldview – the apparent separation of the mind and the world, that in turn reinforces the apparent neutrality of its exercise. To continue to assert an objective schism between technology and the world is nonsense; but it has very real outcomes.

Examples of encoded biases are easy to come by. In 2009, a Taiwanese-American strategy consultant named Joz Wang purchased a new Nikon Coolpix S630 camera for Mother’s Day, but when she tried to take a family photo, the camera repeatedly refused to capture an image. ‘Did someone blink?’ read the error message. The camera, preprogrammed with software to wait until all its subjects were looking, eyes open, in the right direction, failed to account for the different physiognomy of non-Caucasians.¹² The same year, a black employee at an RV dealership in Texas posted a widely viewed YouTube video of his new Hewlett-Packard Pavilion webcam failing to recognise his face, while zooming in on his white colleague. ‘I’m going on record’, he says, ‘I’m saying it. Hewlett-Packard computers are racist.’¹³

Once again, the encoded, and particularly racial, biases of visual technologies are not new. To Photograph the Details of a Dark Horse in Low Light, the title of a 2013 exhibition by the artists Adam Broomberg and Oliver Chanarin, refers to a code phrase used by Kodak when developing a new film in the 1980s. Since the 1950s, Kodak had distributed test cards featuring a white woman and the phrase ‘Normal’ in order to calibrate their films. Jean-Luc Godard refused to use Kodak film on assignment in Mozambique in the seventies, claiming it was racist. But only when two of their biggest clients, the confectionary and furniture industries, complained that dark chocolate and dark chairs were difficult to photograph did the company address the need to image dark bodies.¹⁴ Broomberg and Chanarin also explored the legacy of the Polaroid ID-2, a camera designed for ID shots with a special ‘boost button’ for the flash that made photographing black subjects easier. Much favoured by the apartheid-era government of South Africa, it was the focus of protests by the Polaroid Revolutionary Workers Movement when black American workers discovered it was used to produce the notorious passbook photographs referred to by black South Africans as ‘handcuffs’.¹⁵

But the technology of the Nikon Coolpix and the HP Pavilion masks a more modern, and more insidious, racism: it’s not that their designers set out to create a racist machine, or that it was ever employed for racial profiling; rather, it seems likely that these machines reveal the systemic inequalities still present within today’s technological workforce, where those developing and testing the systems are still predominately white. (As of 2009, Hewlett-Packard’s American workforce was 6.74 per cent black.)¹⁶ It also reveals, as never before, the historic prejudices deeply encoded in our data sets, which are the frameworks on which we build contemporary knowledge and decision making.

This awareness of historic injustice is crucial to understanding the dangers of the mindless implementation of new technologies that uncritically ingest yesterday’s mistakes. We will not solve the problems of the present with the tools of the past. As the artist and critical geographer Trevor Paglen has pointed out, the rise of artificial intelligence amplifies these concerns, because of its utter reliance on historical information as training data: ‘The past is a very racist place. And we only have data from the past to train Artificial Intelligence.’¹⁷

Walter Benjamin, writing in 1940, phrased the problem even more fiercely: ‘There is no document of civilisation which is not at the same time a document of barbarism.’¹⁸ To train these nascent intelligences on the remnants of prior knowledge is thus to encode such barbarism into our future.

And these systems are not merely contained in academic papers and consumer cameras – they are already determining the macro scale of people’s daily lives. In particular, the faith placed in intelligent systems has been implemented widely in police and justice systems. Half of police services in the United States are already employing ‘predictive policing’ systems such as PredPol, a software package that uses ‘high-level mathematics, machine learning, and proven theories of crime behaviour’ to predict the most likely times and places that new crimes can be expected to occur: a weather forecast for lawbreaking.¹⁹

How, once again, do these expectations of physical events get bound up in the stochastic events of everyday life? How do calculations of behaviour take on the force of natural law? How does an idea of the earth, despite all attempts at separation, become one of the mind?

The Great Nōbi Earthquake, which was estimated at 8.0 on the Richter scale, occurred in what is now Aichi Prefecture in 1891. A fault line fifty miles long fell eight metres, collapsing thousands of buildings in multiple cities and killing more than 7,000 people. It is still the largest known earthquake on the Japanese archipelago. In its aftermath, the pioneering seismologist Fusakichi Omori described the pattern of aftershocks: a rate of decay that became known as Omori’s law. It is worth noting at this point that Omori’s law and all that derived from it are empirical laws: that is, they fit to existing data after the event, which differ in every case. They are aftershocks – the rumbling echo of something that already occurred. Despite decades of effort by seismologists and statisticians, no similar calculus has been developed for predicting earthquakes from corresponding foreshocks.

Omori’s law provides the basis for one contemporary implementation of this calculus, called the epidemic type aftershock sequence (ETAS) model, used today by seismologists to study the cascade of seismic activity following a major earthquake. In 2009, mathematicians at University of California, Los Angeles, reported that patterns of crime across a city followed the same model: the result, they wrote, of the ‘local, contagious spread of crime [that] leads to the formation of crime clusters in space and time … For example, burglars will repeatedly attack clusters of nearby targets because local vulnerabilities are well known to the offenders. A gang shooting may incite waves of retaliatory violence in the local set space (territory) of the rival gang.’²⁰ To describe these patterns, they used the geophysical term ‘self-excitation’, the process by which events are triggered and amplified by nearby stresses. The mathematicians even noted the way in which the urban landscape mirrored the layered topology of the earth’s crust, with the risk of crime travelling laterally along a city’s streets.

It is ETAS that forms the basis of today’s predictive policing programmes, estimated as a $25 million industry in 2016, and growing explosively. Whenever Predpol is taken up by a city’s police department, as has happened in Los Angeles, Atlanta, Seattle, and hundreds of other US jurisdictions, the last few years of local data – the time, type, and location of each crime – are analysed using ETAS. The resulting model, constantly updated with new crimes as they occur, is used to produce shift-by-shift heat maps of potential trouble spots. Cruisers are dispatched to the site of potential tremors; police officers are assigned to shaky corners. In this manner, crime becomes a physical force: a wave passing through the strata of urban life. Prediction becomes the justification for stops and searches, tickets, and arrests. The aftershocks of a century-old earthquake rumble through contemporary streets.

The predictability (or otherwise) of earthquakes and homicides; the racial biases of opaque systems: these are, given enough time and thought, amenable to our understanding. They are based on time-worn models, and in the lived experience of the everyday. But what of the new models of thought produced by machines – decisions and consequences that we do not understand, because they are produced by cognitive processes utterly unlike our own?

One dimension of our lack of understanding of machine thought is the sheer scale at which it operates. When Google set out to overhaul its Translate software in 2016, the application was much used but also a byword for unintentional humour. It had been launched in 2006, using a technique called statistical language inference. Rather than trying to understand how languages actually worked, the system imbibed vast corpora of existing translations: parallel texts with the same content in different languages. It was the linguistic equivalent of Chris Anderson’s ‘end of theory’; pioneered by IBM in the 1990s, statistical language inference did away with domain knowledge in favour of huge quantities of raw data. Frederick Jelinek, the researcher who led IBM’s language efforts, famously stated that ‘every time I fire a linguist, the performance of the speech recogniser goes up’.²¹ The role of statistical inference was to remove understanding from the equation and replace it with data-driven correlation.

In one sense, machine translation approaches the ideal described by Benjamin in his 1921 essay The Task of the Translator: that the most faithful translation ignores its original context to allow a deeper meaning to shine through. Benjamin insisted on the primacy of the word over the sentence, of the manner of meaning over its matter: ‘A real translation is transparent,’ he wrote. ‘It does not cover the original, does not block its light, but allows the pure language, as though reinforced by its own medium, to shine upon the original all the more fully.’²² What Benjamin desired of the translator was that, instead of striving to transmit directly what the original writer meant – ‘the inaccurate transmission of an inessential content’ – they might communicate their way of meaning it, that which was unique to their writing and thus to the translation. Such work ‘may be achieved, above all, by a literal rendering of the syntax which proves words rather than sentences to be the primary element of the translator’; only a close reading of the choice of words, rather than the accumulation of superficially meaningful sentences, allows us to access the original’s higher meaning. But Benjamin adds, ‘If the sentence is the wall before the language of the original, literalness is the arcade.’ Translation is always insufficient: it serves to emphasise the distance between languages, not to bridge it. The airiness of the arcade is only achieved when we embrace ‘the distance, alienness, lack, and mismatch between languages’ – translation not as transmission of meaning, but as the awareness of its absence.²³ The machines, it seems, do not get to play in the arcade. (And what would Benjamin make of the fact that Google’s original Translate corpus was composed entirely of multilingual transcripts of meetings of the United Nations and the European Parliament?²⁴ This, too, is an encoding of barbarism.)

In 2016, the situation changed. Instead of employing a strict statistical inference between texts, the Translate system started using a neural network developed by Google Brain, and its abilities suddenly improved exponentially. Rather than simply cross-referencing heaps of texts, the network builds its own model of the world, and the result is not a set of two-dimensional connections between words, but a map of the entire territory. In this new architecture, words are encoded by their distance from one another in a mesh of meaning – a mesh only a computer could comprehend. While a human can draw a line between the words ‘tank’ and ‘water’ easily enough, it quickly becomes impossible to draw on a single map the lines between ‘tank’ and ‘revolution’, between ‘water’ and ‘liquidity’, and all of the emotions and inferences that cascade from those connections. The map is thus multidimensional, extending in more directions than the human mind can hold. As one Google engineer commented, when pursued by a journalist for an image of such a system, ‘I do not generally like trying to visualise thousand-dimensional vectors in three-dimensional space.’²⁵ This is the unseeable space in which machine learning makes its meaning.

Beyond that which we are incapable of visualising is that which we are incapable of even understanding: an unknowability that stresses its sheer alienness to us – although, conversely, it’s this alienness that feels most like intelligence. In New York in 1997, the reigning world chess champion Garry Kasparov faced off against Deep Blue, a computer specially designed by IBM to beat him. Following a similar match in Philadelphia the previous year, which Kasparov won 4–2, the man widely regarded as the greatest chess player of all time was confident of victory. When he lost, he claimed some of Deep Blue’s moves were so intelligent and creative that they must have been the result of human intervention. But we understand why Deep Blue made those moves: its process for selecting them was ultimately one of brute force, a massively parallel architecture of 14,000 custom-designed chess chips, capable of analysing 200 million board positions per second. At the time of the match, it was the 259th most powerful computer on the planet, and it was dedicated purely to chess. It could simply hold more outcomes in mind when choosing where to play next. Kasparov was not outthought, merely outgunned.

By contrast, when the Google Brain–powered AlphaGo software defeated the Korean Go professional Lee Sedol, one of the highest-rated players in the world, something had changed. In the second of five games, AlphaGo played a move that stunned Sedol and spectators alike, placing one of its stones on the far side of the board, and seeming to abandon the battle in progress. ‘That’s a very strange move,’ said one commentator. ‘I thought it was a mistake,’ said the other. Fan Hui, another seasoned Go player who had been the first professional to lose to the machine six months earlier, said of it: ‘It’s not a human move. I’ve never seen a human play this move.’ And he added, ‘So beautiful.’²⁶ In the history of the 2,500-year-old game, nobody had ever played like this. AlphaGo went on to win the game, and the series.

AlphaGo’s engineers developed its software by feeding a neural network millions of moves by expert Go players, and then getting it to play itself millions of times more, developing strategies that outstripped those of human players. But its own representation of those strategies is illegible: we can see the moves it made, but not how it decided to make them. The sophistication of the moves that must have been played in those games between the shards of AlphaGo is beyond imagination, too, but we are unlikely to ever see and appreciate them; there’s no way to quantify sophistication, only winning instinct.

The late and much-lamented Iain M. Banks called the place where these moves occurred ‘Infinite Fun Space’.²⁷ In Banks’s sci-fi novels, his Culture civilisation is administered by benevolent, superintelligent AIs called simply Minds. While the Minds were originally created by humans (or, at least, some biological, carbon-based entities), they have long since outstripped their creators, redesigned and rebuilt themselves, and become both inscrutable and all-powerful. Between controlling ships and planets, directing wars, and caring for billions of humans, the Minds also take up their own pleasures, which involve speculative computations beyond the comprehension of humankind. Capable of simulating entire universes within their imaginations, some Minds retreat forever into Infinite Fun Space, a realm of meta-mathematical possibility, accessible only to superhuman artificial intelligences. And the rest of us, if we spurn the arcade, are left with Finite Fun, fruitlessly analysing the decisions of machines beyond our comprehension.

Some operations of machine intelligence do not stay within Infinite Fun Space however. Instead, they create an unknowingness in the world: new images; new faces; new, unknown, or false events. The same approach by which language can be cast as an infinite mesh of alien meaning can be applied to anything that can be described mathematically – that is, as a web of weighted connections in multidimensional space. Words drawn from human bodies still have relationships, even when shorn of human meaning, and calculations can be performed upon the number of that meaning. In a semantic network, the lines of force – vectors – that define the word ‘queen’ align with those read in the order ‘king - man + woman’.²⁸ The network can infer a gendered relationship between ‘king’ and ‘queen’ by following the path of such vectors. And it can do the same thing with faces.

Given a set of images of people, a neural network can perform calculations that do not merely follow these lines of force, but generate new outcomes. A set of photographs of smiling women, unsmiling women and unsmiling men can be computed to produce entirely new images of smiling men, as shown in a paper published by Facebook researchers in 2015.²⁹

Images

Creating new faces with mathematics. Image from Radford, Metz and Chintala, ‘Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks’.

In the same paper, the researchers generate a range of new images. Using a dataset of more than 3 million photographs of bedrooms from a large-scale image recognition challenge, their network generates new bedrooms: arrangements of colour and furniture that have never existed in the real world, but come into being at the intersection of vectors of bedroomness: walls, windows, duvets and pillows. Machines dreaming dream rooms where no dreams are dreamed. But it is the faces – anthropomorphs that we are – that stick in the mind: Who are these people, and what are they smiling at?

Things get stranger still when these dream images start to interleave with our own memories. Robert Elliott Smith, an artificial intelligence researcher at University College London, returned from a family holiday in France in 2014 with a phone full of photos. He uploaded a number of them to Google+, to share them with his wife, but while browsing through them he noticed an anomaly.³⁰ In one image, he saw himself and his wife at a table in a restaurant, both smiling at the camera. But this photograph had never been taken. At lunch one day, his father had held the button down on his iPhone a little long, resulting in a burst of images of the same scene. Smith uploaded two of them, to see which his wife preferred. In one, he was smiling, but his wife was not; in the other, his wife was smiling, but he was not. From these two images, taken seconds apart, Google’s photo-sorting algorithms had conjured a third: a composite in which both subjects were smiling their ‘best’. The algorithm was part of a package called AutoAwesome (since renamed, simply, ‘Assistant’), which performed a range of tweaks on uploaded images to make them more ‘awesome’ – applying nostalgic filters, turning them into charming animations, and so forth. But in this case, the result was a photograph of a moment that had never happened: a false memory, a rewriting of history.

The doctoring of photographs is an activity as old as the medium itself, but in this case the operation was being performed automatically and invisibly on the artefacts of personal memory. And yet, perhaps there is something to learn from this too: the delayed revelation that images are always false, artificial snapshots of moments that have never existed as singularities, forced from the multidimensional flow of time itself. Unreliable documents; composites of camera and attention. They are artefacts not of the world and of experience, but of the recording process – which, as a false mechanism, can never approach reality itself. It is only when these processes of capture and storage are reified in technology that we are able to perceive their falsity, their alienation from reality. This is the lesson that we might draw from the dreams of machines: not that they are rewriting history, but that history is not something that can be reliably narrativised; and thus, neither can the future. The photographs mapped from the vectors of artificial intelligence constitute not a record but an ongoing reimagining, an ever-shifting set of possibilities of what might have been and what is to come. This cloud of possibility, forever contingent and nebulous, is a better model of reality than any material assertion. This cloud is what is revealed by the technology.

Images

Source: Google.

An image from DeepDream.

This illumination of our own unconscious by the machines is perhaps best illustrated by another weird output from Google’s machine learning research: a programme called DeepDream. DeepDream was designed to better illuminate the internal workings of inscrutable neural networks. In order to learn to recognise objects, a network was fed millions of labelled images of things: trees, cars, animals, houses. When exposed to a new image, the system filtered, stretched, tore and compressed the image through the network in order to classify it: this is a tree, a car, an animal, a house. But DeepDream reversed the process: by feeding an image into the back end of the network, and activating the neurons trained to see particular objects, it asked not what is this image, but what does the network want to see in it? The process is akin to that of seeing faces in clouds: the visual cortex, desperate for stimulation, assembles meaningful patterns from noise.

DeepDream’s engineer, Alexander Mordvintsev, created the first iteration of the programme at two in the morning, having been woken by a nightmare.³¹ The first image he fed into the system was of a kitten sat on a tree stump, and the output was a nightmare monster all its own: a hybrid cat/dog with multiple sets of eyes, and wet noses for feet. When Google first released an untrained classifier network on 10 million random YouTube videos in 2012, the first thing it learned to see, without prompting, was a cat’s face: the spirit animal of the internet.³² Mordvintsev’s network thus dreamed of what it knew, which was more cats and dogs. Further iterations produced Boschian hellscapes of infinite architecture, including arches, pagodas, bridges, and towers in infinite, fractal progressions, according to the neurons activated. But the one constant that recurs throughout DeepDream’s creations is the image of the eye – dogs’ eyes, cats’ eyes, human eyes; the omnipresent, surveillant eye of the network itself. The eye that floats in DeepDream’s skies recalls the all-seeing eye of dystopian propaganda: Google’s own unconscious, composed of our memories and actions, processed by constant analysis and tracked for corporate profit and private intelligence. DeepDream is an inherently paranoid machine because it emerges from a paranoid world.

Images

‘Secure Beneath The Watchful Eyes’, Transport for London, 2002.

Meanwhile, when not being forced to visualise their dreams for our illumination, the machines progress further into their own imaginary space, to places we cannot enter. Walter Benjamin’s greatest wish, in The Task of the Translator, was that the process of transmission between languages would invoke a ‘pure language’ – an amalgam of all the languages in the world. It is this aggregate language that is the medium in which the translator should work, because what it reveals is not the meaning but the original’s manner of thinking. Following the activation of Google Translate’s neural network in 2016, researchers realised that the system was capable of translating not merely between languages, but across them; that is, it could translate directly between two languages it had never seen explicitly compared. For example, a network trained on Japanese–English and English–Korean examples is capable of generating Japanese–Korean translations without ever passing through English.³³ This is called ‘zero-shot’ translation, and what it implies is the existence of an ‘interlingual’ representation: an internal metalanguage composed of shared concepts across languages. This is, to all intents, Benjamin’s pure language; it is the meaningless metalanguage of the arcade. By visualising the architecture of the network and its vectors as splashes of colour and line, it’s possible to see sentences in multiple languages clustered together. The outcome is a semantic representation evolved by, not designed into, the network. But this is as close as we shall ever get, for once again, we are peering through the window of Infinite Fun Land – an arcade we will never get to visit.

Compounding this error, in 2016 a pair of researchers at Google Brain decided to see if neural networks could keep secrets.³⁴ The idea stemmed from that of the adversary: an increasingly common component of neural network designs, and one that would no doubt have pleased Friedrich Hayek. Both AlphaGo and Facebook’s bedroom generator were trained adversarially; that is, they consisted not of a single component that generated new moves or places, but of two competing components that continually attempted to outperform and outguess the other, driving further improvement. Taking the idea of an adversary to its logical conclusion, the researchers set up three networks called, in the tradition of cryptographic experiments, Alice, Bob, and Eve. Their task was to learn how to encrypt information. Alice and Bob both knew a number – a key, in cryptographic terms – that was unknown to Eve. Alice would perform some operation on a string of text, and then send it to Bob and Eve. If Bob could decode the message, Alice’s score increased; but if Eve could, Alice’s score decreased. Over thousands of iterations, Alice and Bob learned to communicate without Eve breaking their code: they developed a private form of encryption like that used in private emails today. But crucially, in the manner of the other neural networks we’ve seen, we don’t understand how this encryption works. Its operation is occluded by the deep layers of the network. What is hidden from Eve is also hidden from us. The machines are learning to keep their secrets.

Isaac Asimov’s Three Laws of Robotics, formulated in the 1940s, state,

1.A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2.A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3.A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.³⁵

To these we might add a fourth: a robot – or any other intelligent machine – must be able to explain itself to humans. Such a law must intervene before the others, because it takes the form not of an injunction to the other, but of an ethic. The fact that this law has – by our own design and inevitably – already been broken, leads inescapably to the conclusion that so will the others. We face a world, not in the future but right now, where we do not understand our own creations. The result of such opacity is always and inevitably violence.

In relating the stories of Kasparov versus Deep Blue and Sedol versus AlphaGo, another parallel story was left untold. Kasparov did indeed leave the game frustrated and in disbelief of the machine’s ability. But his frustration was channelled into finding some way to rescue chess from the dominance of machines. There have been many such attempts; few have proved successful. David Levy, a Scottish chess champion who played many exhibition games against machines in the 1970s and ’80s, developed an ‘anti-computer’ style of restricted play that he described as ‘doing nothing but doing it well’. His play was so conservative that his computer opponents were unable to discern a long-term plan until Levy’s position was so strong that he was unbeatable. Likewise, Boris Alterman, an Israeli grandmaster, developed a strategy in matches against machines in the ’90s and early ’00s that became known as the ‘Alterman Wall’: he would bide his time behind a row of pawns, knowing that the more pieces he had on the board, the more possible moves the machine would have to calculate.³⁶

Along with changes in style, it’s also possible to change the game. Arimaa is a chess variant developed in 2002 by Omar Syed – himself a computer engineer trained in artificial intelligence – specifically designed to be difficult for machines to grasp, while being easy and fun for humans to learn. Its name comes from Syed’s then-four-year-old son, who provided a benchmark for the comprehensibility of the rules. In Arimaa, players can arrange their pieces in any configuration, and must move one of their weakest pieces – pawns renamed as rabbits – to the far side of the board to win. They can also use their stronger pieces to push and pull weaker pieces towards a series of trap squares, removing them from the board and clearing the way for the rabbits. The combination of many different initial setups, the ability of pieces to move other pieces, and the possibility of making up to four moves per turn results in combinatorial explosion: a vast increase in possibilities that rapidly becomes too great for a computer programme to handle – the Alterman Wall taken to exponential extremes. Or so it was hoped. The first computer Arimaa tournament was held in 2004, with the most successful programme winning the right to challenge a group of top human players for a cash prize. In the first few years, the humans easily beat their computer opponents, even increasing the margin of victory as their skills in the new game improved faster than the programmes challenging them. But in 2015, the contest was won decisively by a machine, a result unlikely to be reversed.

It is tempting when confronted by the power and opacity of intelligent systems to delay, derail, or concede the ground. Where Levy and Alterman built walls, Arimaa went back to the land, attempting to carve out an alternative space outside the sphere of machine dominance. This was not Kasparov’s approach. Instead of rejecting the machines, he returned the year after his defeat to Deep Blue with a different kind of chess, which he called ‘Advanced Chess’.

Other names for Advanced Chess include ‘cyborg’ and ‘centaur’ chess. One image evokes the human melded with the machine, the other with the animal – if not something entirely alien. The legend of the centaur in Greek mythology arose perhaps with the arrival of mounted warriors from the steppes of Central Asia, when horse riding was unknown in the Mediterranean. (The Aztecs are reported to have made the same assumption about Spanish cavalrymen.) Robert Graves argued that the centaur was an even more ancient figure: a relic of pre-Hellenic earth cults. The centaurs were also the grandchildren of Nephele, the nymph of the cloud. Thus centaur strategies carry the possibility of being both contemporary necessities in the face of adversity, as well as prelapsarian revivals from less adversarial times.

In Advanced Chess, a human player and a computer chess programme play as a team against another human-computer pair. The results have been revolutionary, opening up new fields and strategies of play previously unseen in the game. One of the effects is that blunders are eliminated: the human can analyse their own proposed movements to such an extent that they can play error-free, resulting in perfect tactical play, and more rigorously deployed strategic plans.

But perhaps the most extraordinary result derived from Advanced Chess, which is normally played by matched human-machine pairs, occurs when human and machine play against a solo machine. Since Deep Blue, many computer programmes have been developed that can beat any human with ease and efficiency: increases in data storage and processing power mean that supercomputers are no longer required for the task. But even the most powerful contemporary programme can be defeated by a skilled player with access to their own computer – even one less powerful than their opponent. Cooperation between human and machine turns out to be a more potent strategy than the most powerful computer alone.

This is the Optometrist Algorithm applied to games, an approach which draws on the respective skills of humans and machines as required, rather than pitting one against the other. Cooperation also reduces the sting of computational opacity: through cooperative play rather than post hoc analysis, we might gain a deeper insight into the way in which complex machines make their decisions. Acknowledging the reality of nonhuman intelligence has deep implications for how we act in the world and requires clear thinking about our own behaviours, opportunities, and limitations. While machine intelligence is rapidly outstripping human performance in many disciplines, it is not the only way of thinking, and it is in many fields catastrophically destructive. Any strategy other than mindful, thoughtful cooperation is a form of disengagement: a retreat that cannot hold. We cannot reject contemporary technology any more than we can ultimately and utterly reject our neighbours in society and the world; we are all entangled. An ethics of cooperation in the present need not be limited to machines either: with other nonhuman entities, animate and non-animate, it becomes another form of stewardship, emphasising acts of universal justice not in an unknowable, uncomputable future, but in the here and now.

This eBook is licensed to martin glick, martinglick@gmail.com on 07/27/2019

7. Complicity

Notes

Show the following:

Adjust appearance:

6

Cognition

Annotate