Special complications with artificial intelligence
The principles to best govern the rise of artificial intelligence (AI) are a natural extension of the general set of principles covered in these pages.
However, a number of special characteristics of AI deserve particular attention. These split into:
- Special characteristics of AI as it already exists
- Problems with training data
- Black-box nature
- Interactions between multiple algorithms
- Special characteristics that AI can be expected to acquire in the future
- Self-improving AI
- Devious AI
- Potential catastrophic disaster
Problems with training data
Today’s AI often involves complex statistical models that have been “trained” – configured – by reference to large sets of “training data”. For example, this data could include:
- Photographs of faces
- Sets of translations of text – such as are produced by official (human) translators working for multinational organisations
- Text spoken with different accents
- Records of games of chess, as played by human experts
- General collections of photographs found on the Internet
On some occasions, this data is “labelled”, meaning that a description is provided along with the item of data. On other occasions, the data lacks labels, but the training algorithm infers patterns and groupings by itself.
Three kinds of problems can arise:
First, the data might incorporate unwelcome biases:
- It might reflect “historical reality” (in which fewer members of some demographics attained various positions in society) rather than “desired reality”
- It might under-represent various segments of society; for example, photographs of hands might only include people with certain skin colours
- It might over-represent various patterns of usage, such as the modes of language used by official translators, rather than more slang usage.
Second, even if the data has no unwelcome biases, an algorithm might fail to learn it fully; it might give acceptable answers on the majority of occasions, but a grossly incorrect answer on other occasions. (This problem is linked to the “black box” nature of the algorithms used, as discussed later on this page).
Third, even if an algorithm performs excellently on examples that conform to the same data formats as the training data, it may give abhorrent answers when presented with “out-of-distribution” examples. Such examples might include:
- Pictures with different orientation
- Voice samples spoken in different accents
- Pictures that have been subtly altered – for example, by the addition of small marks.
Two issues with out-of-distribution examples deserve particular attention:
- Some examples might be deliberately altered, by people with hostile intent, in order to mislead the algorithm; these are known as “adversarial” cases
- It’s by no means obvious, in advance, what are the limits of the distribution on which the algorithm has been trained, and which subtle changes will throw it off course.
The black box nature of AI
In principle, many of the above issues can be solved if an AI system offers a clear explanation of its reasons for particular decisions.
Thus instead of just saying “this medical scan probably contains evidence of a malignant tumour: urgent surgery is recommended”, the AI system should indicate:
- The features of the medical scan it used in reaching its conclusion
- The features of the training data which back up the conclusion reached.
Again, instead of just saying “the CV of this candidate means they are a comparatively poor fit for a specific job vacancy”, the AI system should indicate:
- The features of the CV it used in reaching its conclusion
- The features of the training data which back up the conclusion reached.
However, in practice there are many situations where no such clear explanation of the decisions made by an AI algorithm can be offered. Instead, an analysis of the internal state of the statistical model (the multiple layers of “software neurons”) used by the AI system will yield only a vast collection of numbers. The scale of these numbers defy any simple explanations.
The operations of the algorithm may as well be a “black box”, with no visibility as to what is happening inside.
Interactions between multiple algorithms
A third complicating factor with present-day AI systems is an extension of the point previously noted, whereby a system can produce surprisingly wrong results in circumstances that differ from those typical of the training set used to configure the algorithm.
In this case, the new feature of the environment is an expected new AI system, which coexists with the first one. The new system might be altering other aspects of the landscape in which the first system operates:
- It might do so in an adversarial manner, with an intent to alter the behaviour of the first system
- More simply, its operations may simply overlap with that of the first system, without any intentional coordination.
In other words, two different AI systems can interact with each other to create a new category of risks, different from the risks that either of the systems might generate in isolation.
This “combination effect” already exists with other technologies:
- Two or more drugs, treating different diseases in the same person, can interfere with each other
- Two or more agricultural innovations, addressing different crops growing in nearby environments, can interfere with each other
- Two or more geo-engineering interventions, each intended to reduce the effects of greenhouse gas emissions, could interfere with each other.
What’s different in the case of two or more AI systems interacting is the ways in which the issues previously noted can combine:
- Two or more black box systems, whose internal operations cannot be usefully explained, can produce even bigger surprises when interfering with each other
- Two or more AI systems, that each view a data point as conforming to the basic parameters of their training data, have an increased chance that the data point will be out-of-distribution” in at least one case
- Two or more AI systems, that each try to guard against possible biases within their training data, have an increased chance that unexpected biases with prevail in at least one case.
More significantly, the potential for surprise combination effects grows when we consider, not just AI systems with today’s capabilities, but the more powerful AI systems that are likely to be developed in the future.
It’s now time to review how these forthcoming systems will introduce yet more complications.
Self-improving AI
There already exist AI systems which can help with the design of other AI systems.
Consider the field of AutoML, that is, “automated machine learning”. AutoML is described as follows:
Automated Machine Learning provides methods and processes to make Machine Learning available for non-Machine Learning experts, to improve efficiency of Machine Learning and to accelerate research on Machine Learning.
That description is from the website of the group of researchers who are investigating the possibilities of AutoML. They note that the creation of successful ML models presently “crucially relies on human machine learning experts to perform” a number of tasks. These tasks include:
- Preprocessing and cleaning training data.
- Selecting which kind of machine learning model is best suited to a particular task
- Selecting the so-called “model hyperparameters” which define the basic operating mode of the machine learning model
- Designing the connections between different layers of a potentially “deep” neural network
- Assessing the results obtained
The AutoML researchers comment as follows:
As the complexity of these tasks is often beyond non-ML-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.
In just a few short years, significant progress has been made with AutoML, both in academia and in industry. As early as 2017, Wired technology journalist Tom Simonite wrote a report with the headline “Google’s Learning Software Learns to Write Learning Software”. Here’s an excerpt:
In a project called AutoML, Google’s researchers have taught machine-learning software to build machine-learning software. In some instances, what it comes up with is more powerful and efficient than the best systems the researchers themselves can design. Google says the system recently scored a record 82 percent at categorizing images by their content. On the harder task of marking the location of multiple objects in an image, an important task for augmented reality and autonomous robots, the auto-generated system scored 43 percent. The best human-built system scored 39 percent.
AutoML is an example of a larger trend within AI, namely the application of AI to solve engineering design problems. Examples of this trend include the use of AI:
- To monitor and improve the design of Formula One racing cars
- To improve the design and layout of printed circuit boards (PCBs)
- To accelerate the discovery and design of new pharmaceuticals
- To design new communication and radar systems that are more resilient in congested environments
As time progresses, AI systems will be poised to play larger roles in the design and operation of new AI systems. This could accelerate benefits from more capable AI. But it also increases the risks of outcomes that are unforeseen and dangerous, especially when:
- There are competitive pressures to apply new design techniques as soon as possible
- The design systems appear to produce good results, but their own operation retains elements that are opaque (black box)
- Two or more AI systems, with incompletely understood interference effects, are involved in the design and operation of a next-generation system.
A latent defect in an original AI system, which causes no significant problem in an original implementation, could be magnified by a process of self-enhancement, to the point where the underlying problem has a much larger impact.
Devious AI
The problems arising from the black box nature of AI systems – when aspects of their internal operations are unable to be explained in any simple way – are magnified when an AI has an incentive to operate deviously – that is, to deliberately mislead some observers about aspects of its internal operations.
Deception has been a ubiquitous aspect of human minds. We have many reasons to want to deceive each other – to gain advantages in terms of resources, position, community status, and so on. One reason the human brain grew substantially in power and capability during the long evolution of homo sapiens from primate ancestors was because of an arms race:
- There were incentives to become more skilled in deceiving others
- There were also incentives to become more skilled in detecting deceptions (sometimes without it becoming known that we were aware of a deception).
For AI systems, the considerations are more subtle, but there remain reasons why an AI might seek to deceive other intelligent entities. This involves adversarial situations – or situation that could become adversarial. For example, an AI system that is aware that another intelligence might seek to hack it, or otherwise subvert it, might find it advantageous to misrepresent some of its capabilities and inner states.
A different kind of example is that of the so-called “white lie”, when one intelligent being decides that it is in the best interests of another intelligent being to hear a mistruth – a “white lie”.
The complication is that, the greater the intelligence of an AI system, the greater is its ability to deceive various observers.
AIs posing catastrophic risks
The end outcome of the various trends noted above is that an AI system may acquire sufficient influence over human society and our surrounding environment, that a mistake in that system could catastrophically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.
Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:
- Defect in implementation
- Defect in design
- Design overridden
- Implementation overridden
In more detail:
- The system contains a defect in its implementation. It takes an action that it calculates will have one outcome, but, disastrously, it has another outcome instead. For example, a geo-engineering intervention could trigger an unforeseen change in the global climate, plunging the earth into a state in which humans cannot survive.
- The system contains a defect in its design. It takes actions to advance the goals it has been given, but does so in a way that catastrophically reduces actual human wellbeing. For example, a goal to preserve the diversity of the earth’s biosystem could be met by eliminating upward of 99% of all humans.
- The system has been given goals that are well aligned with human wellbeing, but as the system evolves, a different set of goals emerge, in which the wellbeing of humans is deprioritised. This is similar to the way in which the emergence of higher thinking capabilities in humans led to many humans taking actions in contradiction to the gene-spreading instincts placed into our biology by evolution.
- The system has been given goals that are well aligned with human wellbeing, but the system is reconfigured by hackers of one sort or another – perhaps from malevolence, or perhaps from a misguided sense that various changes would make the system more powerful (and hence more valuable).
Some critics suggest that it will be relatively easy to avoid these four catastrophic error modes. For arguments that, on the contrary, these error modes are deeply problematic, see the pages on:
- The Control Problem
- The Alignment Problem
- No easy solutions
For the principles which can provide solutions, see the pages that describe the Singularity Principles in depth.
The Singularity Principles in perspective
Note that adherence to the Singularity Principles won’t just reduce the risks of catastrophic error. Importantly, it will also reduce the occurrences of errors that, whilst not catastrophic on a global scale, still result in significant harm to human potential – harming people by denying them opportunity, crippling them, or (in, alas, too many cases) killing them.
Moreover, adherence to these Principles won’t just reduce the chances of harm arising from errors with AI. It will also reduce the chances of harm arising from errors with other types of technology.
Finally, the Principles aren’t just about avoiding significant harm. They’re also about raising the probability of attaining profound benefits.