The Knowledge Illusion & AI

Mental Models & Risks

Jul 06, 2023

Boaz Barak, a Harvard professor of Computer Science explained:

Metaphors for AI, and why I don’t like them […]
To be clear, I am by no means “anti metaphor”. Metaphors can be extremely valuable “intuition pumps”. I often use them when teaching, and in research and blogging as well (e.g., my recent post on “intelligence forklift”). Turing himself came up with the “Turing Machine” by modeling a human computer. However, it is always important to remember our metaphors’ limitations. No single metaphor can capture all of AI’s essential elements, and we should be careful of over-interpreting metaphors.

When people think about AI’s impact on society risk, they are using metaphors, either implicitly or explicitly. Marc Andreessen for instance critiqued the concerns over a “paperclip maximizer” analogy here:

MA: Yeah, so the paperclip problem is a very interesting one because it contains what I think is sort of a logical fallacy that’s right at the core of this whole argument, which is for the paperclip argument to work — the term that the doomers use — they call it orthogonality.

Regardless of whether you agree with his argument: the point is the debate over AI risks requires dealing with metaphors. Unfortunately, we do not know what an AGI will be like so we cannot yet know what metaphors are going to be relevant. Unfortunately those with conflicting views of AI risk often seem to have different mental models so they often don’t productively communicate with each other.

A recent debate over AGI existential risk between prominent people on each side of the issue led one AGI alignment researcher to write an essay on:

Munk AI debate: confusions and possible cruxes
Five ways people were talking past each other

The Forecasting Research Institute led by Phillip Tetlock held an “Existential Risk Persuasion Tournament” to explore the views of superforecasters vs domain experts on various risks like AGI. Their report notes that:

We document large-scale disagreement and minimal convergence of beliefs over the course of the XPT, with the largest disagreement about risks from artificial intelligence. […]
Differing styles of argumentation as a barrier to bridging disputes on AI risk One might suppose that forecasters who are free to exchange information, incentivized to make true predictions, and skilled at Bayesian reasoning would converge over time. But they did not.[…]
Deep divisions on AI risk in particular persisted to the end of the XPT. We discuss the reasons for the impasse here. Despite our encouragement to engage, forecasters often talked past each other. Their patterns of argumentation were grounded in different priors, different levels of analysis, and different views about how effectively social and political systems will respond to AI risk: […]
Conflicting priors, which were invoked to justify different burdens of proof. […]
Competing claims about the adaptive capacities of complex social systems.

Economist Tyler Cowen critiqued the whole process of debating AGI risk as being done too informally:

So, if you look, say, at COVID [corona virus disease] or climate change fears, in both cases, there are many models you can look at, including--and then models with data. I'm not saying you have to like those models. But the point is: there's something you look at and then you make up your mind whether or not you like those models; and then they're tested against data. So, when it comes to AGI [artificial general intelligence] and existential risk, it turns out as best I can ascertain, in the 20 years or so we've been talking about this seriously, there isn't a single model done. Period. Flat out.
[…]So, I don't think any idea should be dismissed. I've just been inviting those individuals to actually join the discourse of science. 'Show us your models. Let us see their assumptions and let's talk about those.' The practice, instead, is to write these very long pieces online, which just stack arguments vertically and raise the level of anxiety. It's a bad practice in virtually any theory of risk communication.
[…]My mental model is: There's a thing, science. Try to publish this stuff in journals. Try to model it. Put it out there, we'll talk to you. I don't want to dismiss anyone's worries, but when I talk to people, say, who work in governments who are well aware of the very pessimistic arguments, they're just flat out not convinced for the most part. And, I don't think the worriers are taking seriously the fact they haven't really joined the dialogue yet.

Unfortunately forecasting AGI risk inherently involves thinking about different aspects of the technology and human society and its processes that span multiple intellectual niches. Even within computer science the issues regarding coping with the risks of AGI may involve concepts from computer security and not merely AI. Those knowledgeable and practiced thinking about one area may not be knowledgeable in another, and as with all humans may not fully grasp their lack of knowledge.

Superforecasters who historically have proven more adept at navigating that issue had lower median risk estimates of an AGI catastrophe by 2100 of 2.13% and AI extinction of humanity by AGI by 2100 of 0.38%. AI experts who may be more knowledgeable about AI, but not necessarily other aspects of the issue had a median forecast of a 12% chance of AGI catastrophe by 2100 and 3% risk of AGI extinction of humanity by 2100.

People are often practiced at examining scenarios for internal consistency, but not for thinking through the potential for factors they haven’t considered to interfere with their model scenario arising. Many people fall for conspiracy theories that seem superficially plausible if you only consider the data points they provide: but those with more information see that the scenario they relate is far less likely than other scenarios when you look at the bigger picture.

Some would argue AGI doomers are concocting scenarios that may be internally logical if you make certain assumptions about the world: but that those assumptions are not likely. They view some of the proposed scenarios as being like a society or scenario in a work of fiction that may be internally consistent but seems unlikely to actually arise.

Others argue it is those unconcerned with AGI risk who have a simplistic view of the world that is ignoring a bigger picture. Alan Kay said “the best way to predict the future is to invent it.” In some cases, it is not merely the “best way” but the only way, and there is the risk that AGI will occur quickly once some breakthrough is achieved that we can’t yet imagine. Unfortunately, we do not know enough about what an AGI will be like to definitively make the case one way or another. Often arguments about the topic seem to come down to differing intuitions about the world that cannot be resolved objectively.

Humans often envision varied internally consistent potential futures that seem plausible, like imagining a complex system in state of equilibrium different from the one that currently exists. The question is whether that state is likely, or even possible, to be reached from the current state of the world.

To use an example from economics: fiat currencies like the US dollar tend to be fairly stable over the short term due to vast numbers of items being priced in dollars and vast numbers of transactions occurring. Many people originally envisioned Bitcoin (or later other cryptocurrencies) eventually reaching that type of dynamic price stability: but they haven’t yet found a way to get from the current world to that imagined world. Many people differ greatly how we can go from this world to that one, and the odds of that happening. Emotions and biases cloud that debate, and the AGI risk debate involves far more potential areas for disagreement.

Scott Alexander explored the data from surveys of AI researchers regarding when AGI might occur here. There was a wide range of estimates for when it might occur, and it seems there is a wide range of guesses as to its characteristics.

Forecasting the future of any aspect of reality requires a model of how relevant entities behave. When we drive a car, we forecast its movement and the movement of other cars, pedestrians and vehicles using a mental model we have. Meteorologists use models to forecast the weather, but due to the complexity of the system they cannot predict very far into the future.

Many aspects of the world are too uncertain for us to reliably forecast, but we often make guesses anyway if an issue is important. Unfortunately, cognitive scientists have discovered that people suffer from what some call:

The illusion of explanatory depth (IOED) describes our belief that we understand more about the world than we actually do. It is often not until we are asked to actually explain a concept that we come face to face with our limited understanding of it.

While others use a newer phrase:

People overestimate how well they understand how things work. Direct evidence for this comes from the psychological laboratory. The great Yale psychologist Frank Keil and his students first demonstrated the illusion of explanatory depth, what we call the knowledge illusion.

If someone does not truly have a full model for how AGI will work, can they accurately forecast when it will arise, or its characteristics?

People who study forecasting have found some “superforecasters” are better than others, but unfortunately they are not easy to find and will also struggle with some inherently difficult tasks:

Companies and individuals are notoriously inept at judging the likelihood of uncertain events, as studies show all too well.
At the other end of the spectrum, we find issues that are complex, poorly understood, and tough to quantify, such as the patterns of clouds on a given day or when the next game-changing technology will pop out of a garage in Silicon Valley. Here, too, there’s little advantage in investing resources in systematically improving judgment: The problems are just too hard to crack.

Many acknowledge AGI existential risk is non-zero, but its viewed as low and not imminent for at least several more years. Its like the risk of an asteroid striking the earth. It’d be useful for people to plan for how to deal with it, but some of us choose to focus on other nearer term AI risks and benefits.

Even those nearer term AI risks are difficult for people to discuss and agree on since they are dealing with an entirely new technology where even its near term evolution is difficult to predict. Evolving better ways to debate and evaluate such near term risks and concerns may provide lessons for how to deal with larger future risks.

People disagree vehemently over the level of existential risk posed by an AGI, let alone how to address it. It seems the first step is to address in general how to create frameworks for discussing and dealing with large society risks where there are large degrees of uncertainty and disagreement, and where emotions cloud people’s judgement.

It seems many of the relevant existential risk concerns may arise with humans even without an AGI. Humans with more advanced non-AI technology, like the ability to genetically engineer a virus, will pose existential risk to humanity. Society needs to figure out how to address existential risks in general regardless of any risk of AGI, and it is an issue most across the political spectrum will struggle with aside from tiny minorities. Most who value freedom also believe in a right to defend against existential risk, and most who value protection against risks via government grasp that not everyone follows laws even in an authoritarian police state. That doesn’t mean there will be agreement over how to address these risks, merely that there is a need to consider them.

Society and AI Substack

The Knowledge Illusion & AI

Mental Models & Risks