4
Calculation
Science fiction writers, whose idea of temporality often differs from that of ordinary people, have a term for simultaneous invention: ‘steam engine time’. William Gibson has described it thus:
There’s an idea in the science-fiction community called steam-engine time, which is what people call it when suddenly twenty or thirty different writers produce stories about the same idea. It’s called steam-engine time because nobody knows why the steam engine happened when it did. Ptolemy demonstrated the mechanics of the steam engine, and there was nothing technically stopping the Romans from building big steam engines. They had little toy steam engines, and they had enough metalworking skill to build big steam tractors. It just never occurred to them to do it.1
Steam engines happen when it’s steam engine time: a process almost mystical, almost teleological, because it exists outside the scope of our framework for understanding historical progress. The set of things that had to come together for this particular invention to occur includes so many thoughts and events we could not think or know about that its appearance is like that of a new star: magical and previously unthinkable. But the history of science shows us that all invention is simultaneous and multiauthored. The first treatises on magnetism were written independently in Greece and India around 600 BCE, and in China in the first century CE. The blast furnace appeared in China in the first century CE and in Scandinavia in the twelfth – the possibility of its transference exists, but the Haya people of northwestern Tanzania have also been making steel for 2,000 years, long before the technology developed in Europe. In the seventeenth century, Gottfried Wilhelm Leibniz, Isaac Newton and others independently formulated the rules of calculus. In the eighteenth, the realisation of oxygen emerged almost simultaneously in the work of Carl Wilhelm Scheele, Joseph Priestley, Antoine Lavoisier, and others, while in the nineteenth, Alfred Russel Wallace and Charles Darwin both advanced the theory of evolution. Such histories give the lie to the heroic narrative of history – the lone genius toiling away to produce a unique insight. History is networked and atemporal: steam engine time is a multidimensional structure, invisible to a sensorium trapped in time, but not insensible to it.
Despite such deep realities, there’s a wonderful thing that happens when you hear someone tell a story that just makes sense: a sense of who they are, and where they came from; the sense that something they did makes sense, has history and progress behind it, that it had to happen this way – and that it had to happen to them, because of the story itself.
Tim Berners-Lee, the inventor of the World Wide Web, gave a talk in a tent in Wales in 2010 entitled ‘How the World Wide Web Just Happened’.2 It’s a joyful thing, an exegesis on computation itself as well as a humble hero story. TBL’s parents, Conway Berners-Lee and Mary Lee Woods, were computer scientists; they met and married while working on the Ferranti Mark 1, the first commercially available, general-purpose electronic computer, in Manchester in the 1950s. Conway later devised a technique for editing and compressing text; Mary developed a simulation of London bus routes that was used to reduce delays. TBL describes his childhood as ‘a world full of computing’, and his first experiments involved making magnets and switches from nails and bent wire; his first device was a remote-controlled gun, constructed like a mousetrap, for attacking his siblings. He notes that the transistor had been invented around the time of his birth, and so when he came to secondary school age it was starting to become available in packets in the electronic shops on Tottenham Court Road. He soon began building rudimentary circuits for doorbells and burglar alarms; as his soldering skills increased, so did the range of available transistors, which made it possible to start building more complex circuits. The appearance of the first integrated circuits in turn allowed him to create video display units from old televisions, until he had all the components of an actual computer – which never quite worked, but never mind. And by this time, he was at university, studying physics; after that, he worked on typesetting for digital printers, before joining CERN, where he developed the idea for hypertext – previously expounded by Vannevar Bush, Douglas Engelbart, and others. And because of where he was working and the need of researchers to share interlinked information, he tied this invention to the Transmission Control Protocol (TCP) and the domain name systems that underpin the emerging internet and – ta-da! – the World Wide Web just happened, as naturally and obviously as if it were meant to be.
This, of course, is only one way of telling the story, but it tickles our senses because it makes sense: the rising arc of invention – the graph that always goes up and to the right – coupled to a personal history that leads to myriad interconnections and the spark of insight at the right moment, the right place in time. The Web happened because of the history of microprocessors and telecommunications and wartime industry and commercial requirements, and a bunch of different discoveries and patents and corporate research funds and academic papers and TBL’s own family history; but it also happened because it was Web Time: for a brief moment, the dispositions of culture and technology converged on an invention that, in hindsight, was predicted by everything from ancient Chinese encyclopaedias to microfilm retrieval to the stories of Jorge Luis Borges. The Web was necessary, and so it appeared – in this timeline, at least.
Computing is especially prone to such justificatory histories, which prove its own necessity and inevitability. The sine qua non of self-fulfilling technological prophecies is what is known as Moore’s law, first proposed by Gordon Moore, cofounder of Fairchild Semiconductor and later of Intel, in a paper for Electronics magazine in 1965. Moore’s insight was that the transistor – then, as TBL noted, barely a decade old – was shrinking rapidly. He showed that the number of components per integrated circuit was doubling every year, and projected that this would continue for the next decade. In turn, this rapid increase in raw computing power would drive ever more wondrous applications: ‘Integrated circuits will lead to such wonders as home computers – or at least terminals connected to a central computer – automatic controls for automobiles, and personal portable communications equipment. The electronic wristwatch needs only a display to be feasible today.’3
Moore’s Law.
A decade later, he revised his forecast only slightly, to a doubling every two years. Others put it at around eighteen months, and, despite numerous proclamations of its imminent demise, the rule of thumb has held approximately true ever since. In 1971, the semiconductor feature size – the smallest discrete unit of manufacture – was ten micrometres, or one-fifth the diameter of a human hair. By 1985 it was one micrometre, and then it dropped below one hundred nanometres – the diameter of a virus, for whatever that’s worth – in the early 2000s. At the beginning of 2017, semiconductors with features of ten nanometres were available in smartphones. It used to be believed that miniaturisation would be impossible below seven nanometres, at which point electrons would be free to move through any surface via quantum tunneling; instead, future generations of transistors will take advantage of this effect to make chips the size of atoms themselves, while others predict a future of biological machines composed of DNA and custom, nanoengineered proteins.
So far, so up and to the right. The miniaturisation principle, and its accompanying surge in computational power, is the ever-building wave that Berners-Lee rode through the 1960s, ’70s and ’80s, in order to bring us, neatly and inevitably, to the World Wide Web and the interconnected world of today. But Moore’s law, despite the name by which it came to be known (one which Moore himself wouldn’t use for two decades), is not a law. Rather, it’s a projection – in both senses of the word. It’s an extrapolation from the data but also a phantasm created by the restricted dimensionality of our imagination. It’s a confusion in the same manner as the cognitive bias that feeds our preference for heroic histories, but in the opposite direction. Where one bias leads us to see the inevitable march of progress through historical events to our present moment, the other sees this progress continuing inevitably into the future. And, as such projections do, it has the capability both to shape that future and to influence, in fundamental ways, other projections – regardless of the stability of its original premise.
What began as an off-the-cuff observation became a leitmotif of the long twentieth century, attaining the aura of a physical law. But unlike physical laws, Moore’s law is deeply contingent: it is dependent not merely on manufacturing techniques but on discoveries in the physical sciences, and on the economic and social systems that sustain investment in, and markets for, its products. It is also dependent upon the desires of its consumers, who have come to prize the shiny things that become smaller and faster every year. Moore’s law is not merely technical or economic; it is libidinal.
Starting in the 1960s, the increasingly rapid development of integrated circuit capacity shaped the entire computing industry: as new models of chips became available every year, this expanding capacity became intrinsically tied to the development of the semiconductor itself. No hardware manufacturer or software developer could afford to develop their own architecture; everything had to run on the architecture of a few vendors who kept coming out with ever-denser, more powerful chips. Those building the chips determined the architecture of the machine, all the way to the end consumer. One result of this was the growth of the software industry: freed from its reliance on hardware manufacturers, software became vendor independent, leading first to the dominance of huge companies like Microsoft, Cisco, and Oracle, and then to the economic – and increasingly political and ideological – power of Silicon Valley. Another effect, according to many in the industry, was the end of a culture of craft, care, and efficiency in software itself. While early software developers had to make a virtue of scarce resources, endlessly optimising their code and coming up with ever more elegant and economical solutions to complex calculation problems, the rapid advancement of raw computing power meant that programmers only had to wait eighteen months for a machine twice as powerful to come along. Why be parsimonious with one’s resources when biblical plenty is available in the next sales cycle? In time, the founder of Microsoft himself became associated with another computer scientist’s rule of thumb: Gates’s law, which claims that, as a result of wasteful and inefficient code and redundant features, the speed of software halves every eighteen months.
This, then, is the true legacy of Moore’s law: as software centred itself within society, so its ever-rising power curve came to be associated with the idea of progress itself: a future of plenty for which no accommodations in the present need be made. A computing law become an economic law become a moral law – with its own accusations of bloat and decadence. Even Moore appreciated the wider implications of his theory, telling the Economist on the fortieth anniversary of its coinage, ‘Moore’s Law is a violation of Murphy’s Law. Everything gets better and better.’4
Today, as a direct consequence of Moore’s law, we live in an age of ubiquitous computing, of clouds of apparently infinite computational power; and the moral and cognitive implications of Moore’s law are felt in every aspect of our lives. But despite the best efforts of quantum tunnelers and nanobiologists, continually pushing at the limits of invention, our technology is starting to catch up with our philosophy. What holds in semiconductor research – for now – is turning out not to hold elsewhere: not as scientific law, not as natural law, and not as moral law. And, if we choose to look critically at what our technology is telling us, we can start to discern where we have gone wrong. The error is visible in the data – but the data is all too often used as the argument itself.
In a 2008 article in Wired magazine entitled ‘End of Theory’, Chris Anderson argued that the vast amounts of data now available to researchers made the traditional scientific process obsolete.5 No longer would they need to build models of the world and test them against sampled data. Instead, the complexities of huge and totalising data sets would be processed by immense computing clusters to produce truth itself: ‘With enough data, the numbers speak for themselves.’ As an example, Anderson cited Google’s translation algorithms, which, with no knowledge of the underlying structures of languages, were capable of inferring the relationship between them using extensive corpora of translated texts. He extended this approach to genomics, neurology, and physics, where scientists are increasingly turning to massive computation to make sense of the volumes of information they have gathered about complex systems. In the age of big data, he argued, ‘correlation is enough. We can stop looking for models.’
This is the magic of big data. You don’t really need to know or understand anything about what you’re studying; you can simply place all of your faith in the emergent truth of digital information. In one sense, the big data fallacy is the logical outcome of scientific reductionism: the belief that complex systems can be understood by dismantling them into their constituent pieces and studying each in isolation. And this reductionist approach would hold if it did in practice keep pace with our experiences; in reality, it is proving to be insufficient.
Eroom’s law in pharmaceutical research and development.
a.Overall trend in research and development efficiency (inflation adjusted).
b.Rate of decline over 10-year periods.
c.Adjusting for 5-year delay in spending impact.
Data from Jack W. Scannell, Alex Blanckley, Helen Boldon and Brian Warrington, ‘Diagnosing the decline in pharmaceutical R&D efficiency’, Nature Reviews Drug Discovery 11, 191–200 (March 2012).
One of the places in which it has become increasingly evident that the reliance on vast amounts of data alone is harmful to the scientific method is in pharmacological research. Over the past sixty years, despite the huge growth of the pharmacological industry, and the concomitant investment in drug discovery, the rate at which new drugs are made available has actually fallen when compared to the amount of money spent on research – and it has fallen consistently and measurably. The number of new drugs approved per billion US dollars spent on research and development has halved every nine years since 1950. The downward trend is so clear that researchers have coined a term for it: Eroom’s law – that is, Moore’s law backwards.6
Eroom’s law exemplifies a growing awareness across the sciences that something is deeply and widely wrong with scientific research. The number of new results is not only falling, but those results are becoming less trustworthy, thanks to a combination of different mechanisms.
One metric of scientific progress is the number of papers that are being published in scientific journals – and the corresponding number of retractions that accompany them. Tens of thousands of scientific papers are published every single week, and only a handful of them will be retracted – but even that minority causes deep concern to the scientific community.7 One study in 2011 showed that there had been a tenfold rise in retractions over the previous decade – a finding that set off a scramble to learn more about the problem and uncover what was causing the increase.8 One of the most surprising results was the discovery of a robust correlation between the journal’s retraction index and its impact factor; that is, papers published in higher-profile journals were significantly more likely to be retracted than those published in lower-profile journals.
A follow-up study found that more than two-thirds of the retractions in the biomedical and life sciences had been due to misconduct by researchers, rather than error – and the authors noted that such a result could only be an underestimate, as fraud, by its nature, was underreported.9 (This is neatly illustrated by a survey that found that while only 2 per cent of scientists would admit to falsifying data, 14 per cent said they knew someone who did.)10 Moreover, the number of fraudulent papers was actually increasing as a percentage of all retractions.11 This was shocking to many scientists, as it was widely believed that most retractions were down to honest error. Moreover, the failure to retract poisons the well, leading to more bad science down the line.
There have been several high-profile cases of long-running frauds by senior researchers. In the late ’90s, a South Korean biotechnologist named Hwang Woo-suk was proclaimed ‘the pride of Korea’ for his success in cloning cows and pigs, becoming among the first researchers in the world to do so. While he never supplied scientifically verifiable data, he was keen on photo ops, particularly with politicians, and provided a useful fillip to South Korean national self-esteem. In 2004, following celebrated claims that he had successfully cloned human embryonic stem cells – widely believed to be impossible – he was accused of coercing his own researchers into donating eggs. But this didn’t stop Time magazine from naming him one of the year’s ‘People Who Mattered’ and stating that he had ‘already proved that human cloning is no longer science fiction, but a fact of life’.12 Ongoing ethics investigations were publicly opposed by politicians, patriotic newspapers, and even by public rallies, while over a thousand women pledged to donate their own eggs to the research. Nevertheless it was revealed in 2006 that his research was entirely fabricated. His papers were retracted, and he was given a two-year suspended jail sentence.
In 2011, Diederik Stapel, the dean of Tilburg University’s School of Social and Behavioral Sciences, was forced to resign when it was revealed that he had fabricated the results of almost every study he put his name to, and even those of his graduate students. Stapel, like Hwang, was something of a celebrity in his home country, having published numerous studies that made waves in Dutch society. In 2011, for example, he published one study based on Utrecht’s main train station that seemed to show that people exhibited more racist behaviour in dirty environments, and another claiming that eating meat made people selfish and antisocial.13 Both relied on nonexistent data. When he was exposed, Stapel blamed his actions on a fear of failure and the pressure on academics to publish frequently and prominently in order to maintain their positions.
Hwang and Stapel, while outliers, might embody one of the reasons articles in the most prominent journals are more likely to be retracted: they’re written by the scientists making the biggest claims, under the most professional and societal pressure. But such frauds are also being revealed by a series of connected, network effects: the increasing openness of scientific practice, the application of technology to the analysis of scientific publications, and the increasing willingness of other scientists – particularly junior ones – to challenge results.
As more and more scientific papers become available to wider and wider communities through open access programmes and online distribution, more and more of them come under increased scrutiny. Not all of this scrutiny is human: universities and companies have developed a range of products for automatically checking academic papers for plagiarism, by comparing them against huge databases of existing publications. In turn, students have developed techniques – such as ‘Rogeting’, named for the thesaurus, which involves carefully substituting synonyms for words in the original text – in order to fool the algorithms. An arms race develops between writer and machine, with the latest plagiarism detectors employing neural networks to winkle out uncommon words and phrases that might point towards manipulation. But neither plagiarism nor outright fraud suffice to account for a larger crisis within science: replicability.
Replication is a cornerstone of the scientific method: it requires that any experiment be repeatable by another group of independent researchers. But in reality, very few experiments are replicated – and the more that are, the more fail the test. At the University of Virginia’s Center for Open Science, an initiative called the Reproducibility Project has, since 2011, tried to replicate the findings of five landmark cancer studies: to take the same experimental setup, rerun the experiments, and get the same results. Each of the initial experiments have been cited thousands of times: their replicability should be guaranteed. But in the event, after painstaking reconstructions, only two of the experiments were repeatable; two were inconclusive, and one completely failed. And the problem is not limited to medicine: a general study undertaken by Nature found that 70 per cent of scientists had failed to replicate the findings of other researchers.14 Across the board, from medicine to psychology, biology to environmental sciences, researchers are coming to the realisation that many of the foundations of their research may be flawed.
The reasons behind the crisis are multiple and, like the fraud cases that make up a relatively small part of the problem, are in part a result of the increased visibility of research, and the increased possibility of review. But other problems are more systemic: from the pressure on scientists to publish – which means questionable results are sexed up and counterexamples quietly filed away – to the very tools with which scientific results are generated.
The most controversial of these techniques is p-hacking. P stands for probability, denoting the value at which an experimental result can be considered statistically significant. The ability to calculate a p-value in many different situations has made it a common marker for scientific rigour in experiments. A value of p less than 0.05 – meaning that there is a less than 5 per cent chance of a correlation being the result of chance, or a false positive – is widely agreed across many disciplines to be the benchmark for a successful hypothesis. But the result of this agreement is that a p-value less than 0.05 becomes a target, rather than a measure. Researchers, given a particular goal to aim for, can selectively cull from great fields of data in order to prove any particular hypothesis.
As an example of how p-hacking works, let’s hypothesise that green dice, uniquely among all other dice, are loaded. Take ten green dice and roll each of them one hundred times. Of those 1,000 rolls, 183 turn up a six. If the dice were absolutely fair, the number of sixes should be 1,000/6, which is 167. Something’s up. In order to determine the validity of the experiment, we need to calculate the p-value of our experiment. But the p-value has nothing to do with the actual hypothesis: it is simply the probability that random rolls would turn up 183 or more sixes. For 1,000 dice rolls, that probability is only 4 per cent, or p = 0.04 – and just like that, we have an experimental result that is deemed sufficient by many scientific communities to warrant publication.15
Why should such a ridiculous process be regarded as anything other than a gross simplification? It shouldn’t be – except that it works. It’s easy to calculate and it’s easy to read, meaning that more and more journals use it as shorthand for reliability when sifting through potentially thousands of submissions. Moreover, p-hacking doesn’t just depend on getting those serendipitous results and running with them. Instead, researchers can comb through vast amounts of data to find the results they need. Say that instead of rolling ten green dice, I also rolled ten blue ones, ten yellow ones, ten red ones, and so on. I could roll fifty different colours, and most of them would come out close to the average. But the more I rolled, the more likely I would be to get an anomalous result – and this is the one I could publish. This practice has given p-hacking another name: data dredging. Data dredging has become particularly notorious in the social sciences, where social media and other sources of big behavioural data have suddenly and vastly increased the amount of information available to researchers. But the pervasiveness of p-hacking isn’t limited to the social sciences.
A comprehensive analysis of 100,000 open access papers in 2015 found evidence of p-hacking across multiple disciplines.16 The researchers mined the papers for every p-value they could find, and they discovered that the vast majority just scraped under the 0.05 boundary – evidence, they said, that many scientists were adjusting their experimental designs, data sets, or statistical methods in order to get a result that crossed the significance threshold. It was results such as these that led the editor of PLOS ONE, a leading medical journal, to publish an editorial attacking statistical methods in research entitled ‘Why most published research findings are false.’17
It’s worth emphasising at this point that data dredging is not the same as fraud. Even if results don’t stand up, one of the greatest concerns in the scientific community is not that researchers might be deliberately massaging results, but that they might be doing so unconsciously, thanks to a combination of institutional pressures, lax publishing standards, and the sheer volume of data available to them. This combination of increasing retractions, falling replicability, and the inherent complexity of scientific analysis and distribution concerns the entire scientific community, and this concern is itself corrosive. Science depends on trust: trust between researchers, and trust in researchers by the public. Any erosion of this trust is deeply damaging to the future of scientific research, whether caused by the deliberate actions of a few bad apples or widely distributed across multiple actors and causes, many of them next to unknowable.
Some scholars have been warning for decades of a possible crisis in scientific quality control, and many of them have linked it to the exponential growth in data and research. In the 1960s, Derek de Solla Price – who studied the concentrated networks formed between different papers and writers through citations and shared fields of study – graphed the growth curve of science. The data he employed reflected widely ranging factors, from material production to the energy of particle accelerators, the founding of universities, and the discovery of elements. Like Moore’s law, everything goes up and to the right. If science did not radically change its modes of production, de Solla Price feared, it would face saturation, when its ability to absorb and act meaningfully on the amount of information available would start to break down, followed by ‘senility’.18 Spoiler: science hasn’t changed.
In recent years these fears have crystallised in a concept referred to as overflow.19 Put simply, overflow is the opposite of scarcity: it is the boundless upwelling of information. Moreover, and in contrast to abundance, it is overwhelming, affecting our ability to process its effects. In studies of the economics of attention, overflow addresses how people choose which subjects to prioritise when they have too little time and too much information. As the authors of one study note, it also ‘evokes the image of a mess that needs to be dealt with, or waste that needs to be removed’.20
Overflow exists in many fields, and when it is recognised, strategies evolve for its management. Traditionally, this role is performed by gatekeepers, such as journalists and editors, who select which information should be published. With the role of gatekeeper comes an expectation of specialism and expertise, a certain responsibility and, often, a position of authority. In science, overflow manifests in the rapid proliferation of journals and papers, in the number of applications for grants and academic positions, and in the volume of information and research available. Even the length of the average paper increases, as researchers pad their findings with more and more references to accommodate richer data and higher demand for startling results. The result is a failure of quality control: even the gold standard of peer review is regarded as no longer sufficiently objective or fit for purpose, as the number of papers accelerates and it is mired in institutional reputation games. In turn, this leads to calls for an increase in open publishing of scientific papers – a result that may in turn simply increase the sheer volume of research being published.21
But what if the problem of overflow isn’t limited to science’s outputs, but to its inputs too? As de Solla Price feared, science has continued in its trajectory of assembling ever-vaster and more-complex datasets. When it was announced in 1990, the human genome project was regarded as the greatest single data-gathering project in history, but the plunging cost of DNA sequencing means that multiples of its data are now churned out every year. This data is increasing rapidly and is widely distributed, making it impossible to study all of it comprehensively.22 The Large Hadron Collider generates too much data to even store on site, meaning that only certain kinds of events can be stored, leading to criticisms that once the Higgs boson particle was discovered, the data was unsuitable for discovering anything else.23 All science is becoming the science of big data.
It’s this realisation that brings us back to Moore’s law – and Eroom’s. As in the other sciences, despite the proliferation of research institutions, academic journals and positions (and the vast amounts of money being thrown at the problem), the actual results are degrading. During the 1980s and ’90s, combinatorial chemistry increased 800-fold the rate at which drug-like molecules could be synthesised. DNA sequencing has become a billion times faster since the first successful technique was established. Databases of proteins have grown 300 times larger in twenty-five years. And while the cost of screening for new drugs has fallen, and the amount of research funding has continued to climb, the actual number of new drugs discovered has fallen off exponentially.
What could be causing this reversal of the law of progress? There are several hypotheses. The first, and generally regarded as the least significant, is the possibility that the low-hanging fruit has already been picked: all the best targets – the most obvious choices for investigation – have already been exploited. But this isn’t really the case: there are decades worth of existing substances still waiting to be investigated, and once investigated they can be added to the list of known comparators, exponentially increasing the field of research.
Then there’s the ‘better than the Beatles’ problem, which worries that even if there are lots of drugs still to be investigated, many existing ones are so good at what they do that they effectively preclude further research into the area. Why start a band when the Beatles already did everything worth doing? This is a variation on the low-hanging fruit problem, with one important difference. While ‘low-hanging fruit’ suggests that there are no easy targets remaining, ‘better than the Beatles’ implies that the fruit already picked lessens the value of what remains on the tree. In most industries, the opposite is the case: the relatively cheap process of strip-mining and burning surface coal, for example, makes what remains in deep mines more valuable, which in turn finances its exploitation. In contrast, trying to outdo existing generic drugs only increases the cost of clinical trials and the difficulty of persuading doctors to prescribe the results, as they are comfortably familiar with the existing ones.
Other problems with drug discovery are more systemic, and less tractable. Some blame reckless spending by bloated drug companies, drunk on Moore’s law, as the defining factor driving Eroom’s law. But most research institutions have – in line with other industries – ploughed their funds into the latest technologies and techniques. If these aren’t the answer to the problem, something else must be amiss.
The ‘cautious regulator’ theory, on a longer timeline, puts the blame on the ever-lower tolerance of society for risky clinical outcomes. Since the golden age of drug discovery in the 1950s, the number of regulations governing the trial and release of drugs has increased – and for good reason. Clinical trials in the past often came with terrible side effects, and further disasters awaited when poorly tested drugs reached the market. The best – or worst – example of this is thalidomide, introduced in the 1950s to treat anxiety and nausea, but which proved to have horrifying consequences for the children of mothers to whom it was prescribed to combat morning sickness. In the aftermath, drug regulations were tightened in ways that made testing more rigorous – but that also actually improved outcomes. The US Drug Efficacy Amendment of 1962 required that new drugs proved not only that they were safe, but that they actually did what they claimed to do – not previously a legal requirement. Few of us would countenance a return to riskier drugs in order to reverse Eroom’s law, particularly when exceptions can be made when needed, as they were for several anti-HIV drugs in the 1980s.
The final problem with drug research is the one that most concerns us, and it is the one believed by researchers to be the most significant. Pharmacologists term this the ‘basic research/brute force’ bias, but we can call it the automation problem. Historically, the process of discovering new medicines was the domain of small teams of researchers intensively focused on small groups of molecules. When an interesting compound was identified in natural materials, from libraries of synthesised chemicals, or by serendipitous discovery, its active ingredient would be isolated and screened against biological cells or organisms to evaluate its therapeutic effect. In the last twenty years, this process has been widely automated, culminating in a technique known as high-throughput screening, or HTS. HTS is the industrialisation of drug discovery: a wide-spectrum, automated search for potential reactions within huge libraries of compounds.
Picture a cross between a modern car factory – all conveyor belts and robot arms – and a data centre – rack upon rack of trays, fans, and monitoring equipment – and you’re closer to the contemporary laboratory than the received vision of (predominately) men in white coats tinkering with bubbling glassware. HTS prioritises volume over depth: vast libraries of chemical compounds are fed into the machines and tested against each other. The process strip-mines the chemical space, testing thousands of combinations nearly simultaneously. And at the same time, it reveals the almost ungraspable extent of that space, and the impossibility of modeling all possible interactions.
The researchers in the laboratory are of course aware, if at one remove, of all the economic pressures produced by existing discoveries and cautious regulators, but it’s in the laboratory itself that these knotty problems meet the runaway technological pressure of new inventions. For those with the most money – the drug companies – the impulse to feed these problems into the latest and fastest technologies is irresistible. As one report puts it: ‘Automation, systematisation and process measurement have worked in other industries. Why let a team of chemists and biologists go on a trial and error-based search of indeterminable duration, when one could quickly and efficiently screen millions of leads against a genomics-derived target, and then simply repeat the same industrial process for the next target, and the next?’24
But it’s in the laboratory that the limitations of this approach are becoming starkly clear. High-throughput screening has accelerated Eroom’s law, rather than abated it. And some are starting to suspect that messy human empiricism may actually be more, not less, efficient than computation. Eroom’s law might even be the codification – with data – of something many leading scientists have been saying for some time.
In 1974, speaking to the US House Committee on Science and Astronautics, the Austrian biochemist Erwin Chargaff complained, ‘Now when I go through a laboratory … there they all sit before the same high-speed centrifuges or scintillation counters, producing the same superposable graphs. There has been very little room left for the all important play of scientific imagination.’25 He also made clear the connection between overreliance on instrumentation, and the economic pressures that engendered it: ‘Homo Ludens has been overcome by the seriousness of corporate finances.’ As a result, Chargaff said, ‘a pall of monotony has descended on what used to be the liveliest and most attractive of all scientific professions’. Such sentiments are hardly original, echoing every critique of technological intervention in human perception from television to video games, with the difference that computational pharmacology is creating an empirical body of data about its own failure: the machine is chronicling its own inefficiency, in its own language.
Thinking clearly about what this means requires rejecting zero-sum readings of technological progress and acknowledging grey areas of thought and understanding. Faced with this accounting of purely machinic failure, how are we to reintroduce Homo Ludens into scientific research? One answer might be found in another laboratory, in another fiendishly complex assemblage of experimental equipment: that assembled to crack open the secrets of nuclear fusion.
One of the holy grails of scientific research, nuclear fusion promises near-limitless clean energy, capable of powering cities and space rockets on just a few grams of fuel. It is also notoriously difficult to achieve. Despite the construction of experimental reactors since the 1940s, with continuous development and discovery across the field, no design has ever produced positive net energy – that is, the generation of more power than that required to trigger the fusion reaction in the first place. (The only man-made fusion reactions ever to do so were the Operation Castle series of thermonuclear tests on the Marshall Islands in the 1950s. A subsequent proposal to generate energy by detonating hydrogen bombs in caverns deep under the American Southwest was cancelled when it was shown to be too expensive to build a sufficient number of bombs for continuous generation.)
Occurring in a plasma of superheated gases, fusion reactions are the same as those that produce energy and heavy elements in stars – a popular descriptor among fusion enthusiasts is ‘a star in a jar’. At extreme temperatures, atomic nuclei can fuse together; if the right materials are used, the reaction is exothermic, releasing energy that can then be captured and used to generate electricity. But containing the superheated plasma is a huge challenge. A common approach in contemporary reactors is to use massive magnetic fields or powerful lasers to mould the plasma into a stable, doughnut-shaped ring, or torus, but the calculations required to do so are fiendishly complicated, and deeply interdependent. The shape of the containment vessel; the materials used; the composition of the fuel; the timing, strength, and angles of magnets and lasers; the pressure of gases; and the voltages applied all affect the stability of the plasma. The longest continuous runtime of a fusion reactor as of this writing was twenty-nine hours, set by a doughnut-type tokamak reactor in 2015; but sustaining this required vast amounts of energy. Another promising technique, known as field-reversed configuration – which creates a cylindrical plasma field – requires much lower energies. However, its longest runtime was just eleven milliseconds.
That achievement was made by a private research company: Tri Alpha Energy, based in California. Tri Alpha’s design fires two ‘smoke rings’ of plasma into each other at a million kilometres an hour, creating a cigar-shaped field up to three metres long and forty centimetres across.26 The design also uses hydrogen-boron fuel instead of the more common deuterium-tritium mix. While much harder to ignite, boron, unlike tritium, is plentiful on earth. In 2014, Tri Alpha announced that they had achieved reactions lasting up to five milliseconds, and in 2015 they claimed these reactions could be sustained.
The next challenge is to improve on these results, which only becomes harder as the temperature and power increase. Multiple control and input parameters can be set at the start of each experiment, such as magnet strength and gas pressure, but the reaction is also subject to drift: as the experimental run progresses, conditions inside the reactor vessel change, necessitating continuous, instantaneous adjustments. This means the problem of fine-tuning the machine is both nonlinear and highly coupled: changing one variable in one direction might produce unexpected results, or might change the effect of other inputs. It’s not a simple problem of changing one thing at a time and seeing what happens; rather, there exists a high-dimensional landscape of possible settings that has to be surveyed through continuous exploration.
At first sight, these look like the perfect conditions for the type of brute-force experimental approach used in pharmacology: from a huge data set of possible settings, algorithms hack path after path through the territory, slowly building up a map and gradually revealing the peaks and valleys of experimental outcomes.
But simple brute force won’t work here. The problem is complicated by the fact that there’s no ‘goodness metric’ for plasma – no simple output number that makes it clear to the algorithm which experimental runs are ‘best’. A more variegated human judgement of the process is required to distinguish between different runs. Furthermore, the scale of the accidents you can cause in a petri dish are limited; inside a fusion reactor, where megawatts of energy superheat pressurised gases to billions of degrees, the possibility of damaging the expensive and unique apparatus is acute, and the boundaries of safe operation are not fully understood. It requires human oversight to prevent an overzealous algorithm from proposing a set of inputs that might wreck the machine.
In response to this problem, Tri Alpha and machine-learning specialists from Google came up with something they call the Optometrist Algorithm.27 The algorithm is named after the either-or choices presented to a patient during an eye test: Which is better, this one, or this one? In Tri Alpha’s experiments, they collapse thousands of possible settings down to thirty or so meta-parameters, which are more easily grasped by the human experimenter. After each shot of plasma – occurring every eight minutes during experimental runs – the algorithm moves the settings a short distance and tries again: the new results are shown to a human operator, alongside the results of the best preceding shot, and the human has the final say over which shot forms the basis for subsequent tests. In this way, the Optometrist Algorithm combines human knowledge and intuition with the ability to navigate through a high-dimensional solution space.
When the algorithm was first deployed, Tri Alpha’s experiment was intended to extend the stability of the plasma, and thus the length of the reaction. But during the exploration of the parameter space, the human operator noticed that in certain experiments the total energy of the plasma suddenly and briefly increased – an anomalous result that might yet be harnessed to improve the sustainability of the reaction. While the automated part of the algorithm was not set up to take account of this, the human operator could guide it towards new settings that not only sustained the length of the experiment, but also increased its total energy. These unexpected settings became the basis for an entirely new regime of tests, one that better accounted for the unpredictability of scientific exploration.
As the experiments progressed, the researchers realised that the benefits of combining human and machine intelligence worked both ways: the researchers became better at intuiting improvements from complex results, while the machine pushed them to explore a greater range of possible inputs, negating the human tendency to avoid the remote edges of a possibility space. Ultimately, the Optometrist approach of random sampling combined with human interpretation may be applicable to a wide range of problems across science that require the understanding and optimisation of complex systems.
The mechanism that is being enacted when the Optometrist goes to work is particularly interesting to those attempting to reconcile the opaque operation of complex computational problem solving with human needs and desires. On the one hand is a problem so fiendishly complicated that the human mind cannot fully grasp it, but one that a computer can ingest and operate upon. On the other is the necessity of bringing a human awareness of ambiguity, unpredictability, and apparent paradox to bear on the problem – an awareness that is itself paradoxical, because it all too often exceeds our ability to consciously express it.
Tri Alpha’s researchers call their approach ‘attempting to optimise a hidden utility model that the human experts may not be able to express explicitly’. What they mean is that there is an order to the complexity of their problem space, but it is an order that exceeds the human ability to describe it. The multidimensional spaces of fusion reactor design – and the encoded representations of neural networks that we will explore in a later chapter – undoubtedly exist, but they are impossible to visualise. While these technologies open up the possibility of working effectively with such indescribable systems, they also insist upon us acknowledging that they exist at all – and not merely in the domains of pharmacological and physical sciences, but in questions of morality and justice. They necessitate thinking clearly about what it means to live at all times among complex and interrelated systems, in states of doubt and uncertainty that may be beyond reconciliation.
Admitting to the indescribable is one facet of a new dark age: an admission that the human mind has limits to what it can conceptualise. But not all problems in the sciences can be overcome even by the application of computation, however sympathetic. As more-complex solutions are brought to bear on ever more complex problems, we risk even-greater systemic problems being overlooked. Just as the accelerating progress of Moore’s law locked computation into a particular pathway, necessitating certain architecture and hardware, so the choice of these tools fundamentally shapes the way we can address and even think through the next set of problems we face.
The way we think the world is shaped by the tools at our disposal. As the historians of science Albert van Helden and Thomas Hankins put it in 1994, ‘Because instruments determine what can be done, they also determine to some extent what can be thought.’28 These instruments include the entire sociopolitical framework that supports scientific investigation, from government funding, academic institutions, and the journal industry, to the construction of technologies and software that vest unparalleled economic power and exclusive knowledge in Silicon Valley and its subsidiaries. There is also a deeper cognitive pressure at work: the belief in the singular, inviolable answer, produced, with or without human intervention, by the alleged neutrality of the machine. As science becomes increasingly technologised, so does every domain of human thought and action, gradually revealing the extent of our unknowning, even as it reveals new possibilities.
The same rigorous scientific method that, down one path, leads us to the dwindling returns of Eroom’s law, also helps us to see and respond to that very problem. Vast quantities of data are necessary to see the problems with vast quantities of data. What matters is how we respond to the evidence in front of us.
This eBook is licensed to martin glick, martinglick@gmail.com on 07/27/2019