Special complications with artificial intelligence
The principles to best govern the rise of artificial intelligence (AI) are a natural extension of the general set of principles covered in this book.
However, a number of special characteristics of AI deserve particular attention. These split into two groups:
- Special characteristics of AI as it already exists
- Problems with training data
- Black-box nature
- Interactions between multiple algorithms
- Special characteristics that AI can be expected to acquire in the future
- Self-improving AI
- Devious AI
- Potential catastrophic disaster
Problems with training data
Today’s AI often involves complex statistical models that have been “trained” – configured – by reference to large sets of “training data”. For example, this data could include:
- Photographs of faces
- Sets of translations of text – such as are produced by official (human) translators working for multinational organisations
- Recordings of text spoken with different accents
- Archives of games of chess, as played by human experts
- General collections of photographs found on the Internet.
On some occasions, this data is “labelled”, meaning that a description is provided along with the item of data. On other occasions, the data lacks labels, but the training algorithm infers patterns and groupings by itself.
Three kinds of problems can arise:
First, the data might incorporate unwelcome biases:
- It might reflect “historical reality” (in which fewer members of some demographics attained various positions in society) rather than “desired reality”
- It might under-represent various segments of society; for example, photographs of hands might only include people with certain skin colours
- It might over-represent various patterns of usage, such as the modes of language used by official translators, rather than more slang usage.
Second, even if the data has no unwelcome biases, an algorithm might fail to learn it fully; it might give acceptable answers on the majority of occasions, but a grossly incorrect answer on other occasions. (This problem is linked to the “black box” nature of the algorithms used, as discussed later in this chapter.)
Third, even if an algorithm performs excellently on examples that conform to the same patterns and formats as the training data, it may give abhorrent answers when presented with “out-of-distribution” examples. Such examples might include:
- Pictures with different orientation
- Voice samples spoken in different accents
- Pictures that have been subtly altered – for example, by the addition of small marks.
Two issues with out-of-distribution examples deserve particular attention:
- Some examples might be deliberately altered, by agents with hostile intent, in order to mislead the algorithm; these are known as “adversarial” cases
- It’s by no means obvious, in advance, what are the limits of the distribution on which the algorithm has been trained, and which subtle changes will throw it off course.
The black box nature of AI
In principle, many of the above issues can be solved if an AI system offers a clear explanation of its reasons for particular decisions.
Thus instead of just saying “this medical scan probably contains evidence of a malignant tumour: urgent surgery is recommended”, the AI system should indicate:
- The features of the medical scan it used in reaching its conclusion
- The features of the training data which back up the conclusion reached
- Any contrary indications that should also be borne in mind.
Again, instead of just saying “the CV of this candidate means they are a comparatively poor fit for a specific job vacancy”, the AI system should indicate:
- The features of the CV it used in reaching its conclusion
- The features of the training data which back up the conclusion reached
- Any contrary indications that should also be borne in mind.
However, in practice there are many situations where no such clear explanation of the decisions made by an AI algorithm can be offered. Instead, an analysis of the internal state of the statistical model (the multiple layers of “software neurons”) used by the AI system will yield only a vast collection of numbers. The scale of these numbers means they defy any simple explanations.
The operations of the algorithm may as well be a “black box”, with no visibility as to what is happening inside.
Interactions between multiple algorithms
A third complicating factor with present-day AI systems is an extension of the point previously noted, whereby a system can produce surprisingly wrong results in circumstances that differ from those typical of the training set used to configure the algorithm.
In this case, the new feature of the environment is an unexpected new AI system, which coexists with the first one. The new system might be altering aspects of the landscape in which the first system operates:
- It might do so in an adversarial manner, with an intent to alter the behaviour of the first system
- Alternatively, its operations may simply overlap with that of the first system, without any intentional manipulation.
In other words, two different AI systems can interact with each other to create a new category of risks, different from the risks that either of the systems might generate in isolation.
For an amusing real-life example, consider the interaction of two simple algorithms, each of which adjusted the price at which a book would be offered for sale, based on the price offered by a competing online bookseller:
- One bookseller, profnath, periodically set the price of the book to be 0.9983 times the price quoted by the other bookseller, bordeebook
- Bordeebook, independently, periodically adjusted the price of the book to be 1.270589 times the price quoted by profnath
- The booksellers each had their own reasons for making these adjustments, in line with their brand positioning or sales strategy
- The result, however, was that one book ended up being listed for sale at the astronomical price of $23,698,655.93 per copy (plus, as it happens, $3.99 shipping).
That particular example was relatively harmless, but other interactions could be much more serious – especially if the algorithms involved were more complex.
Similar “combination effects” already exist with other technologies:
- Two or more drugs, treating different diseases in the same person, can interfere with each other
- Two or more agricultural innovations, addressing different crops growing in nearby environments, can interfere with each other
- Two or more geo-engineering interventions, each intended to reduce the effects of greenhouse gas emissions, could interfere with each other.
What’s different in the case of two or more AI systems interacting is the ways in which the issues previously noted can combine:
- Two or more black box systems, whose internal operations cannot be usefully explained, can produce even bigger surprises when interfering with each other
- Two or more AI systems, that each view a data point as conforming to the basic parameters of their training data, have an increased chance that the data point will be out-of-distribution in at least one case
- Two or more AI systems, that each try to guard against possible biases within their training data, have an increased chance that unexpected biases will prevail in at least one case.
More significantly, the potential for surprise combination effects grows when we consider, not just AI systems with today’s capabilities, but the more powerful AI systems that are likely to be developed in the future.
It’s now time to review how these forthcoming AI systems will introduce yet more complications.
There already exist AI systems which can help with the design of other AI systems.
Consider the field of AutoML, that is, “automated machine learning”. AutoML is described as follows:
Automated Machine Learning provides methods and processes to make Machine Learning available for non-Machine Learning experts, to improve efficiency of Machine Learning and to accelerate research on Machine Learning.
That description is from the website of the group of researchers who are investigating the possibilities of AutoML. They note that the creation of successful ML models presently “crucially relies on human machine learning experts to perform” a number of tasks. These tasks include:
- Preprocessing and cleaning training data
- Selecting which kind of machine learning model is best suited to a particular task
- Selecting the so-called “model hyperparameters” which define the basic operating mode of the machine learning model
- Designing the connections between different layers of a potentially “deep” neural network
- Assessing the results obtained.
The AutoML researchers comment as follows:
As the complexity of these tasks is often beyond non-ML-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.
In just a few short years, significant progress has been made with AutoML, both in academia and in industry. As early as 2017, Wired technology journalist Tom Simonite wrote a report with the headline “Google’s Learning Software Learns to Write Learning Software”. Here’s an excerpt:
In a project called AutoML, Google’s researchers have taught machine-learning software to build machine-learning software. In some instances, what it comes up with is more powerful and efficient than the best systems the researchers themselves can design. Google says the system recently scored a record 82 percent at categorizing images by their content. On the harder task of marking the location of multiple objects in an image, an important task for augmented reality and autonomous robots, the auto-generated system scored 43 percent. The best human-built system scored 39 percent.
AutoML is an example of a larger trend within AI, namely the application of AI to solve engineering design problems. Examples of this trend include the use of AI:
- To monitor and improve the design of Formula One racing cars
- To improve the design and layout of printed circuit boards (PCBs)
- To accelerate the discovery and design of new pharmaceuticals
- To design new communication and radar systems that are more resilient in congested environments.
As time progresses, AI systems will be poised to play, in a similar way, larger roles in the design and operation of new AI systems. This positive-feedback self-improvement loop could accelerate emergence of more capable AI. But it also increases the risks of outcomes that are unforeseen and dangerous, especially when:
- There are competitive pressures to apply new design techniques as soon as possible
- The design systems appear to produce good results, but their own operation retains elements that are opaque (black box)
- Two or more AI systems, with incompletely understood interference effects, are involved in the design and operation of a next-generation system.
A latent defect in an original AI system, which causes no significant problem in an original implementation, could be magnified by a process of self-enhancement, to the point where the underlying problem has a much larger impact.
The problems arising from the black box nature of AI systems – when aspects of their internal operations are unable to be explained in any simple way – are magnified when an AI has an incentive to operate deviously – that is, to deliberately mislead some observers about aspects of its internal operations.
Deception has been a ubiquitous aspect of human minds since prehistoric times. We have many reasons to want to deceive each other – to gain advantages in terms of resources, position, community status, and so on. One reason the human brain grew substantially in power and capability during the long evolution of homo sapiens from primate ancestors was because of an arms race:
- There were incentives to become more skilled in deceiving others
- There were also incentives to become more skilled in detecting and keeping track of deceptions (sometimes without it becoming known that we were aware of a deception).
For AI systems, the considerations are more subtle, but there remain reasons why an AI might seek to deceive other intelligent entities. This involves adversarial situations – or situations that could become adversarial. For example, an AI system that is aware that another intelligence might seek to hack it, or otherwise subvert it, might find it advantageous to misrepresent some of its capabilities and inner states. It might even wish to “play dead” – or to “play dumb”.
A different kind of example is that of the so-called “white lie”, when one intelligent being decides that it is in the best interests of another intelligent being to hear a mistruth – a “white lie”.
The complication is that, the greater the intelligence of an AI system, the greater is its ability to deceive various observers. Being smarter means you can be more devious.
Four catastrophic error modes
The end outcome of the various trends noted above is that an AI system may acquire so much influence over human society and our surrounding environment, that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.
Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:
- Defect in implementation
- Defect in design
- Design overridden
- Implementation overridden.
In more detail:
- The system contains a defect in its implementation. It takes an action that it calculates will have one outcome, but, disastrously, it has another outcome instead. For example, a geo-engineering intervention could trigger an unforeseen change in the global climate, plunging the earth into a state in which humans cannot survive.
- The system contains a defect in its design. It takes actions to advance the goals it has explicitly been given, but does so in a way that catastrophically reduces actual human wellbeing. For example, a clumsily specified goal to focus on preserving the diversity of the earth’s biosystem could be met by eliminating upward of 99% of all humans.
- The system has been given goals that are well aligned with human wellbeing, but as the system evolves, a different set of goals emerge, in which the wellbeing of humans is deprioritised. This is similar to the way in which the recent emergence of higher thinking capabilities in human primates led to many humans taking actions in opposition to the gene-spreading instincts placed into our biology by evolution.
- The system has been given goals that are well aligned with human wellbeing, but the system is reconfigured by hackers of one sort or another – perhaps from malevolence, or perhaps from a misguided sense that various changes would make the system more powerful (and hence more valuable).
Some critics suggest that it will be relatively easy to avoid these four catastrophic error modes. The next three chapters provide arguments that, on the contrary, these error modes are deeply problematic. These chapters are:
The chapters after that will describe in some depth the principles which can provide solutions, namely the Singularity Principles.
The broader perspective
Note that adherence to the Singularity Principles won’t just reduce the risks of catastrophic error. Importantly, that adherence will also reduce the occurrences of errors that, whilst not catastrophic on a global scale, still result in significant harm to human potential – harming people by denying them opportunity, crippling them, or (in, alas, too many cases) killing them.
Moreover, adherence to these principles won’t just reduce the chances of harm arising from errors with AI. It will also reduce the chances of harm arising from errors with other types of technology, such as NBIC.
Finally, the Singularity Principles aren’t just about avoiding significant harm. Critically, they’re also about raising the probability of attaining profound benefits.