Special complications with artificial intelligence

The principles to best govern the rise of artificial intelligence (AI) are a natural extension of the general set of principles covered in this book.

However, a number of special characteristics of AI deserve particular attention. These split into two groups:

Special characteristics of AI as it already exists
- Problems with training data
- Black-box nature
- Interactions between multiple algorithms
Special characteristics that AI can be expected to acquire in the future
- Self-improving AI
- Devious AI
- Potential catastrophic disaster

Problems with training data

Today’s AI often involves complex statistical models that have been “trained” – configured – by reference to large sets of “training data”. For example, this data could include:

Photographs of faces
Sets of translations of text – such as are produced by official (human) translators working for multinational organisations
Recordings of text spoken with different accents
Archives of games of chess, as played by human experts
General collections of photographs found on the Internet.

On some occasions, this data is “labelled”, meaning that a description is provided along with the item of data. On other occasions, the data lacks labels, but the training algorithm infers patterns and groupings by itself.

Three kinds of problems can arise:

First, the data might incorporate unwelcome biases:

It might reflect “historical reality” (in which fewer members of some demographics attained various positions in society) rather than “desired reality”
It might under-represent various segments of society; for example, photographs of hands might only include people with certain skin colours
It might over-represent various patterns of usage, such as the modes of language used by official translators, rather than more slang usage.

Second, even if the data has no unwelcome biases, an algorithm might fail to learn it fully; it might give acceptable answers on the majority of occasions, but a grossly incorrect answer on other occasions. (This problem is linked to the “black box” nature of the algorithms used, as discussed later in this chapter.)

Third, even if an algorithm performs excellently on examples that conform to the same patterns and formats as the training data, it may give abhorrent answers when presented with “out-of-distribution” examples. Such examples might include:

Pictures with different orientation
Voice samples spoken in different accents
Pictures that have been subtly altered – for example, by the addition of small marks.

Two issues with out-of-distribution examples deserve particular attention:

Some examples might be deliberately altered, by agents with hostile intent, in order to mislead the algorithm; these are known as “adversarial” cases
It’s by no means obvious, in advance, what are the limits of the distribution on which the algorithm has been trained, and which subtle changes will throw it off course.

The black box nature of AI

In principle, many of the above issues can be solved if an AI system offers a clear explanation of its reasons for particular decisions.

Thus instead of just saying “this medical scan probably contains evidence of a malignant tumour: urgent surgery is recommended”, the AI system should indicate:

The features of the medical scan it used in reaching its conclusion
The features of the training data which back up the conclusion reached
Any contrary indications that should also be borne in mind.

Again, instead of just saying “the CV of this candidate means they are a comparatively poor fit for a specific job vacancy”, the AI system should indicate:

The features of the CV it used in reaching its conclusion
The features of the training data which back up the conclusion reached
Any contrary indications that should also be borne in mind.

However, in practice there are many situations where no such clear explanation of the decisions made by an AI algorithm can be offered. Instead, an analysis of the internal state of the statistical model (the multiple layers of “software neurons”) used by the AI system will yield only a vast collection of numbers. The scale of these numbers means they defy any simple explanations.

The operations of the algorithm may as well be a “black box”, with no visibility as to what is happening inside.

Interactions between multiple algorithms

A third complicating factor with present-day AI systems is an extension of the point previously noted, whereby a system can produce surprisingly wrong results in circumstances that differ from those typical of the training set used to configure the algorithm.

In this case, the new feature of the environment is an unexpected new AI system, which coexists with the first one. The new system might be altering aspects of the landscape in which the first system operates:

It might do so in an adversarial manner, with an intent to alter the behaviour of the first system
Alternatively, its operations may simply overlap with that of the first system, without any intentional manipulation.

In other words, two different AI systems can interact with each other to create a new category of risks, different from the risks that either of the systems might generate in isolation.

For an amusing real-life example, consider the interaction of two simple algorithms, each of which adjusted the price at which a book would be offered for sale, based on the price offered by a competing online bookseller:

One bookseller, profnath, periodically set the price of the book to be 0.9983 times the price quoted by the other bookseller, bordeebook
Bordeebook, independently, periodically adjusted the price of the book to be 1.270589 times the price quoted by profnath
The booksellers each had their own reasons for making these adjustments, in line with their brand positioning or sales strategy
The result, however, was that one book ended up being listed for sale at the astronomical price of $23,698,655.93 per copy (plus, as it happens, $3.99 shipping).

That particular example was relatively harmless, but other interactions could be much more serious – especially if the algorithms involved were more complex.

Similar “combination effects” already exist with other technologies:

Two or more drugs, treating different diseases in the same person, can interfere with each other
Two or more agricultural innovations, addressing different crops growing in nearby environments, can interfere with each other
Two or more geo-engineering interventions, each intended to reduce the effects of greenhouse gas emissions, could interfere with each other.

What’s different in the case of two or more AI systems interacting is the ways in which the issues previously noted can combine:

Two or more black box systems, whose internal operations cannot be usefully explained, can produce even bigger surprises when interfering with each other
Two or more AI systems, that each view a data point as conforming to the basic parameters of their training data, have an increased chance that the data point will be out-of-distribution in at least one case
Two or more AI systems, that each try to guard against possible biases within their training data, have an increased chance that unexpected biases will prevail in at least one case.

More significantly, the potential for surprise combination effects grows when we consider, not just AI systems with today’s capabilities, but the more powerful AI systems that are likely to be developed in the future.

It’s now time to review how these forthcoming AI systems will introduce yet more complications.

Self-improving AI

There already exist AI systems which can help with the design of other AI systems.

Consider the field of AutoML, that is, “automated machine learning”. AutoML is described as follows:

Automated Machine Learning provides methods and processes to make Machine Learning available for non-Machine Learning experts, to improve efficiency of Machine Learning and to accelerate research on Machine Learning.

That description is from the website of the group of researchers who are investigating the possibilities of AutoML. They note that the creation of successful ML models presently “crucially relies on human machine learning experts to perform” a number of tasks. These tasks include:

Preprocessing and cleaning training data
Selecting which kind of machine learning model is best suited to a particular task
Selecting the so-called “model hyperparameters” which define the basic operating mode of the machine learning model
Designing the connections between different layers of a potentially “deep” neural network
Assessing the results obtained.

The AutoML researchers comment as follows:

As the complexity of these tasks is often beyond non-ML-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.

In just a few short years, significant progress has been made with AutoML, both in academia and in industry. As early as 2017, Wired technology journalist Tom Simonite wrote a report with the headline “Google’s Learning Software Learns to Write Learning Software”. Here’s an excerpt:

In a project called AutoML, Google’s researchers have taught machine-learning software to build machine-learning software. In some instances, what it comes up with is more powerful and efficient than the best systems the researchers themselves can design. Google says the system recently scored a record 82 percent at categorizing images by their content. On the harder task of marking the location of multiple objects in an image, an important task for augmented reality and autonomous robots, the auto-generated system scored 43 percent. The best human-built system scored 39 percent.

AutoML is an example of a larger trend within AI, namely the application of AI to solve engineering design problems. Examples of this trend include the use of AI:

As time progresses, AI systems will be poised to play, in a similar way, larger roles in the design and operation of new AI systems. This positive-feedback self-improvement loop could accelerate emergence of more capable AI. But it also increases the risks of outcomes that are unforeseen and dangerous, especially when:

There are competitive pressures to apply new design techniques as soon as possible
The design systems appear to produce good results, but their own operation retains elements that are opaque (black box)
Two or more AI systems, with incompletely understood interference effects, are involved in the design and operation of a next-generation system.

A latent defect in an original AI system, which causes no significant problem in an original implementation, could be magnified by a process of self-enhancement, to the point where the underlying problem has a much larger impact.

Devious AI

The problems arising from the black box nature of AI systems – when aspects of their internal operations are unable to be explained in any simple way – are magnified when an AI has an incentive to operate deviously – that is, to deliberately mislead some observers about aspects of its internal operations.

Deception has been a ubiquitous aspect of human minds since prehistoric times. We have many reasons to want to deceive each other – to gain advantages in terms of resources, position, community status, and so on. One reason the human brain grew substantially in power and capability during the long evolution of homo sapiens from primate ancestors was because of an arms race:

There were incentives to become more skilled in deceiving others
There were also incentives to become more skilled in detecting and keeping track of deceptions (sometimes without it becoming known that we were aware of a deception).

For AI systems, the considerations are more subtle, but there remain reasons why an AI might seek to deceive other intelligent entities. This involves adversarial situations – or situations that could become adversarial. For example, an AI system that is aware that another intelligence might seek to hack it, or otherwise subvert it, might find it advantageous to misrepresent some of its capabilities and inner states. It might even wish to “play dead” – or to “play dumb”.

A different kind of example is that of the so-called “white lie”, when one intelligent being decides that it is in the best interests of another intelligent being to hear a mistruth – a “white lie”.

The complication is that, the greater the intelligence of an AI system, the greater is its ability to deceive various observers. Being smarter means you can be more devious.

Four catastrophic error modes

The end outcome of the various trends noted above is that an AI system may acquire so much influence over human society and our surrounding environment, that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.

Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:

Defect in implementation
Defect in design
Design overridden
Implementation overridden.

In more detail:

The system contains a defect in its implementation. It takes an action that it calculates will have one outcome, but, disastrously, it has another outcome instead. For example, a geo-engineering intervention could trigger an unforeseen change in the global climate, plunging the earth into a state in which humans cannot survive.
The system contains a defect in its design. It takes actions to advance the goals it has explicitly been given, but does so in a way that catastrophically reduces actual human wellbeing. For example, a clumsily specified goal to focus on preserving the diversity of the earth’s biosystem could be met by eliminating upward of 99% of all humans.
The system has been given goals that are well aligned with human wellbeing, but as the system evolves, a different set of goals emerge, in which the wellbeing of humans is deprioritised. This is similar to the way in which the recent emergence of higher thinking capabilities in human primates led to many humans taking actions in opposition to the gene-spreading instincts placed into our biology by evolution.
The system has been given goals that are well aligned with human wellbeing, but the system is reconfigured by hackers of one sort or another – perhaps from malevolence, or perhaps from a misguided sense that various changes would make the system more powerful (and hence more valuable).

Some critics suggest that it will be relatively easy to avoid these four catastrophic error modes. The next three chapters provide arguments that, on the contrary, these error modes are deeply problematic. These chapters are:

The chapters after that will describe in some depth the principles which can provide solutions, namely the Singularity Principles.

The broader perspective

Note that adherence to the Singularity Principles won’t just reduce the risks of catastrophic error. Importantly, that adherence will also reduce the occurrences of errors that, whilst not catastrophic on a global scale, still result in significant harm to human potential – harming people by denying them opportunity, crippling them, or (in, alas, too many cases) killing them.

Moreover, adherence to these principles won’t just reduce the chances of harm arising from errors with AI. It will also reduce the chances of harm arising from errors with other types of technology, such as NBIC.

Finally, the Singularity Principles aren’t just about avoiding significant harm. Critically, they’re also about raising the probability of attaining profound benefits.

But we won’t obtain these benefits unless we solve the issues of control and/or alignment.

The need for a better politics is more pressing than ever.

Since its formation, Transpolitica has run a number of different projects aimed at building momentum behind a technoprogressive vision for a better politics. For a new decade, it’s time to take a different approach, to build on previous initiatives.

The planned new vehicle has the name “RAFT 2035”.

RAFT is an acronym:

Roadmap (‘R’) – not just a lofty aspiration, but specific steps and interim targets
towards Abundance (‘A’) for all – beyond a world of scarcity and conflict
enabling Flourishing (‘F’) as never before – with life containing not just possessions, but enriched experiences, creativity, and meaning
via Transcendence (‘T’) – since we won’t be able to make progress by staying as we are.

RAFT is also a metaphor. Here’s a copy of the explanation:

When turbulent waters are bearing down fast, it’s very helpful to have a sturdy raft at hand.

The fifteen years from 2020 to 2035 could be the most turbulent of human history. Revolutions are gathering pace in four overlapping fields of technology: nanotech, biotech, infotech, and cognotech, or NBIC for short. In combination, these NBIC revolutions offer enormous new possibilities – enormous opportunities and enormous risks:…

Rapid technological change tends to provoke a turbulent social reaction. Old certainties fade. New winners arrive on the scene, flaunting their power, and upturning previous networks of relationships. Within the general public, a sense of alienation and disruption mingles with a sense of profound possibility. Fear and hope jostle each other. Whilst some social metrics indicate major progress, others indicate major setbacks. The claim “You’ve never had it so good” coexists with the counterclaim “It’s going to be worse than ever”. To add to the bewilderment, there seems to be lots of evidence confirming both views.

The greater the pace of change, the more intense the dislocation. Due to the increased scale, speed, and global nature of the ongoing NBIC revolutions, the disruptions that followed in the wake of previous industrial revolutions – seismic though they were – are likely to be dwarfed in comparison to what lies ahead.

Turbulent times require a space for shelter and reflection, clear navigational vision despite the mists of uncertainty, and a powerful engine for us to pursue our own direction, rather than just being carried along by forces outside our control. In short, turbulent times require a powerful “raft” – a roadmap to a future in which the extraordinary powers latent in NBIC technologies are used to raise humanity to new levels of flourishing, rather than driving us over some dreadful precipice.

The words just quoted come from the opening page of a short book that is envisioned to be published in January 2020. The chapters of this book are reworked versions of the scripts used in the recent “Technoprogressive roadmap” series of videos.

Over the next couple of weeks, all the chapters of this proposed book will be made available for review and comment:

As pages on the Transpolitica website, starting here
As shared Google documents, starting here, where comments and suggestions are welcome.

RAFT Cover 21

All being well, RAFT 2035 will also become a conference, held sometime around the middle of 2020.

You may note that, in that way that RAFT 2035 is presented to the world,

The word “transhumanist” has moved into the background – since that word tends to provoke many hostile reactions
The word “technoprogressive” also takes a backseat – since, again, that word has negative connotations in at least some circles.

If you like the basic idea of what’s being proposed, here’s how you can help:

Read some of the content that is already available, and provide comments
- If you notice something that seems mistaken, or difficult to understand
- If you think there is a gap that should be addressed
- If you think there’s a better way to express something.

Thanks in anticipation!

Transpolitica

Anticipating tomorrow's politics

AI complications

Special complications with artificial intelligence

Problems with training data

The black box nature of AI

Interactions between multiple algorithms

Self-improving AI

Devious AI

Four catastrophic error modes

The broader perspective

Recent Posts

RAFT 2035 – a new initiative for a new decade

Special complications with artificial intelligence

Problems with training data

The black box nature of AI

Interactions between multiple algorithms

Self-improving AI

Devious AI

Four catastrophic error modes

The broader perspective

Share this:

Recent Posts

Share this: