

Why There Is No “I” in AI

Since its inception in 1956, the field of artifi

cial intelligence has gone through several cycles of enthusiasm followed by pessimism. AI scientists call these “AI summers” and “AI winters.” Each wave was based on a new technology that promised to put us on the path to creating intelligent machines, but ultimately these innovations fell short. AI is currently experiencing another wave of enthusiasm, another AI summer, and, once again, expectations in the industry are high. The set of technologies driving the current surge are artificial neural networks, often referred to as deep learning. These methods have achieved impressive results on tasks such as labeling pictures, recognizing spoken language, and driving cars. In 2011, a computer beat the top-ranked humans playing the game show Jeopardy!, and in 2016, another computer bested the world’s top-ranked player of the game Go. These last two accomplishments made headlines around the world. These achievements are impressive, but are any of these machines truly intelligent?

Most people, including most AI researchers, don’t think so. There are numerous ways today’s artificial intelligence falls short of human intelligence. For example, humans learn continuously. As I described earlier, we are constantly amending our model of the world. In contrast, deep learning networks have to be fully trained before they can be deployed. And once they are deployed, they can’t learn new things on the go. For example, if we want to teach a vision neural network to recognize an additional object, then the network has to be retrained from the ground up, which can take days. However, the biggest reason that today’s AI systems are not considered intelligent is they can only do one thing, whereas humans can do many things. In other words, AI systems are not flexible. Any individual human, such as you or me, can learn to play Go, to farm, to write software, to fly a plane, and to play music. We learn thousands of skills in our lifetime, and although we may not be the best at any one of these skills, we are flexible in what we can learn. Deep learning AI systems exhibit almost no flexibility. A Go-playing computer may play the game better than any human, but it can’t do anything else. A self-driving car may be a safer driver than any human, but it can’t play Go or fix a flat tire.

The long-term goal of AI research is to create machines that exhibit human-like intelligence—machines that can rapidly learn new tasks, see analogies between different tasks, and flexibly solve new problems. This goal is called “artificial general intelligence,” or AGI, to distinguish it from today’s limited AI. The essential question today’s AI industry faces is: Are we currently on a path to creating truly intelligent AGI machines, or will we once again get stuck and enter another AI winter? The current wave of AI has attracted thousands of researchers and billions of dollars of investment. Almost all these people and dollars are being applied to improving deep learning technologies. Will this investment lead to human-level machine intelligence, or are deep learning technologies fundamentally limited, leading us once again to reinvent the field of AI? When you are in the middle of a bubble, it is easy to get swept up in the enthusiasm and believe it will go on forever. History suggests we should be cautious.

I don’t know how long the current wave of AI will continue to grow. But I do know that deep learning does not put us on the path to creating truly intelligent machines. We can’t get to artificial general intelligence by doing more of what we are currently doing. We have to take a different approach.

Two Paths to AGI

There are two paths that AI researchers have followed to make intelligent machines. One path, the one we are following today, is focused on getting computers to outperform humans on specific tasks, such as playing Go or detecting cancerous cells in medical images. The hope is that if we can get computers to outperform humans on a few difficult tasks, then eventually we will discover how to make computers better than humans at every task. With this approach to AI, it doesn’t matter how the system works, and it doesn’t matter if the computer is flexible. It only matters that the AI computer performs a specific task better than other AI computers, and ultimately better than the best human. For example, if the best Go-playing computer was ranked sixth in the world, it would not have made headlines and it might even be viewed as a failure. But beating the world’s top-ranked human was seen as a major advance.

The second path to creating intelligent machines is to focus on flexibility. With this approach, it isn’t necessary that the AI performs better than humans. The goal is to create machines that can do many things and apply what they learn from one task to another. Success along this path could be a machine that has the abilities of a five-year-old child or even a dog. The hope is that if we can first understand how to build flexible AI systems, then, with that foundation, we can eventually make systems that equal or surpass humans.

This second path was favored in some of the earlier waves of AI. However, it proved to be too difficult. Scientists realized that to be as capable as a five-year-old child requires possessing a huge amount of everyday knowledge. Children know thousands of things about the world. They know how liquids spill, balls roll, and dogs bark. They know how to use pencils, markers, paper, and glue. They know how to open books and that paper can rip. They know thousands of words and how to use them to get other people to do things. AI researchers couldn’t figure out how to program this everyday knowledge into a computer, or how to get a computer to learn these things.

The difficult part of knowledge is not stating a fact, but representing that fact in a useful way. For example, take the statement “Balls are round.” A five-year-old child knows what this means. We can easily enter this statement into a computer, but how can a computer understand it? The words “ball” and “round” have multiple meanings. A ball can be a dance, which isn’t round, and a pizza is round, but not like a ball. For a computer to understand “ball,” it has to associate the word with different meanings, and each meaning has different relationships to other words. Objects also have actions. For example, some balls bounce, but footballs bounce differently than baseballs, which bounce differently than tennis balls. You and I quickly learn these differences by observation. No one has to tell us how balls bounce; we just throw a ball to the ground and see what happens. We aren’t aware of how this knowledge is stored in our brain. Learning everyday knowledge such as how balls bounce is effortless.

AI scientists couldn’t figure out how to do this within a computer. They invented software structures called schemas and frames to organize knowledge, but no matter what they tried, they ended up with an unusable mess. The world is complex; the number of things a child knows and the number of links between those things seems impossibly large. I know it sounds like it should be easy, but no one could figure out how a computer could know something as simple as what a ball is.

This problem is called knowledge representation. Some AI scientists concluded that knowledge representation was not only a big problem for AI, it was the only problem. They claimed that we could not make truly intelligent machines until we solved how to represent everyday knowledge in a computer.

Today’s deep learning networks don’t possess knowledge. A Go-playing computer does not know that Go is a game. It doesn’t know the history of the game. It doesn’t know if it is playing against a computer or a human, or what “computer” and “human” mean. Similarly, a deep learning network that labels images may look at an image and say it is a cat. But the computer has limited knowledge of cats. It doesn’t know that cats are animals, or that they have tails, legs, and lungs. It doesn’t know about cat people versus dog people, or that cats purr and shed fur. All the deep learning network does is determine that a new image is similar to previously seen images that were labeled “cat.” There is no knowledge of cats in the deep learning network.

Recently, AI scientists have tried a different approach to encoding knowledge. They create large artificial neural networks and train them on lots of text: every word in tens of thousands of books, all of Wikipedia, and almost the entire internet. They feed the text into the neural networks one word at a time. By training this way, the networks learn the likelihood that certain words follow other words. These language networks can do some surprising things. For example, if you give the network a few words, it can write a short paragraph related to those words. It is difficult to tell whether the paragraph was written by a human or the neural network.

AI scientists disagree as to whether these language networks possess true knowledge or are just mimicking humans by remembering the statistics of millions of words. I don’t believe any kind of deep learning network will achieve the goal of AGI if the network doesn’t model the world the way a brain does. Deep learning networks work well, but not because they solved the knowledge representation problem. They work well because they avoided it completely, relying on statistics and lots of data instead. How deep learning networks work is clever, their performance is impressive, and they are commercially valuable. I am only pointing out that they don’t possess knowledge and, therefore, are not on the path to having the ability of a five-year-old child.

Brains as a Model for AI

From the moment I became interested in studying the brain, I felt that we would have to understand how it works before we could create intelligent machines. This seemed obvious to me, as the brain is the only thing that we know of that is intelligent. Over the following decades, nothing changed my opinion. That is one reason I have doggedly pursued brain theory: I feel it is a necessary first step to creating truly intelligent AI. I’ve lived through multiple waves of AI enthusiasm, and each time I resisted jumping on board. It was clear to me that the technologies used were not even remotely like the brain, and therefore AI would get stuck. Figuring out how the brain works is hard, but it is a necessary first step to creating intelligent machines.

In the first half of this book, I described the progress we have made in understanding the brain. I described how the neocortex learns models of the world using maplike reference frames. In the same way that a paper map represents knowledge about a geographic area such as a town or country, the maps in the brain represent knowledge about the objects we interact with (such as bicycles and smartphones), knowledge about our body (such as where our limbs are and how they move), and knowledge about abstract concepts (such as mathematics).

The Thousand Brains Theory solves the problem of knowledge representation. Here is an analogy to help you understand how. Let’s say I want to represent knowledge about a common object, a stapler. Early AI researchers would try to do this by listing the names of the different parts of the stapler and then describing what each part does. They might write a rule about staplers that says, “When the top of the stapler is pressed down, a staple comes out of one end.” But to understand this statement, words such as “top,” “end,” and “staple” had to be defined, as did the meaning of the different actions such as “pressed down” and “comes out.” And this rule is insufficient on its own. It doesn’t say which way the staple faces when it comes out, what happens next, or what you should do if the staple gets stuck. So, the researchers would write additional rules. This method of representing knowledge led to a never-ending list of definitions and rules. AI researchers didn’t see how to make it work. Critics argued that even if all the rules could be specified, the computer still wouldn’t “know” what a stapler is.

The brain takes a completely different approach to storing knowledge about a stapler: it learns a model. The model is the embodiment of knowledge. Imagine for a moment that there is a tiny stapler in your head. It is exactly like a real stapler—it has the same shape, the same parts, and it moves in the same ways—it’s just smaller. The tiny model represents everything you know about staplers without needing to put a label on any of the parts. If you want to recall what happens when the top of a stapler is pressed down, you press down on the miniature model and see what happens.

Of course, there isn’t a tiny physical stapler in your head. But the cells in the neocortex learn a virtual model that serves the same purpose. As you interact with a real stapler, the brain learns its virtual model, which includes everything you have observed about the real stapler, from its shape to how it behaves when you use it. Your knowledge of staplers is embedded in the model. There isn’t a list of stapler facts and stapler rules stored in your brain.

Let’s say I ask you what happens when the top of a stapler is pushed down. To answer this question, you don’t find the appropriate rule and play it back to me. Instead, your brain imagines pressing down on the stapler, and the model recalls what happens. You can use words to describe it to me, but the knowledge is not stored in words or rules. The knowledge is the model.

I believe the future of AI will be based on brain principles. Truly intelligent machines, AGI, will learn models of the world using maplike reference frames just like the neocortex. I see this as inevitable. I don’t believe there is another way to create truly intelligent machines.

Moving from Dedicated to Universal AI Solutions

The situation we are in today reminds me of the early days of computing. The word “computer” originally referred to people whose job was to perform mathematical calculations. To create numeric tables or to decode encrypted messages, dozens of human computers would do the necessary calculations by hand. The very first electronic computers were designed to replace human computers for a specific task. For example, the best automated solution for message decryption was a machine that only decrypted messages. Computing pioneers such as Alan Turing argued that we should build “universal” computers: electronic machines that could be programmed to do any task. However, at that time, no one knew the best way to build such a computer.

There was a transitionary period where computers were built in many different forms. There were computers designed for specific tasks. There were analog computers, and computers that could only be repurposed by changing the wiring. There were computers that worked with decimal instead of binary numbers. Today, almost all computers are the universal form that Turing envisioned. We even refer to them as “universal Turing machines.” With the right software, today’s computers can be applied to almost any task. Market forces decided that universal, general-purpose computers were the way to go. This is despite the fact that, even today, any particular task can be performed faster or with less power using a custom solution, such as a special chip. Product designers and engineers usually prefer the lower cost and convenience of general-purpose computers, even though a dedicated machine could be faster and use less power.

A similar transition will occur with artificial intelligence. Today we are building dedicated AI systems that are the best at whatever task they are designed to do. But in the future, most intelligent machines will be universal: more like humans, capable of learning practically anything.

Today’s computers come in many shapes and sizes, from the microcomputer in a toaster to room-size computers used for weather simulation. Despite their differences in size and speed, all these computers work on the same principles laid out by Turing and others many years ago. They are all instances of universal Turing machines. Similarly, intelligent machines of the future will come in many shapes and sizes, but almost all of them will work on a common set of principles. Most AI will be universal learning machines, similar to the brain. (Mathematicians have proven that there are some problems that cannot be solved, even in principle. Therefore, to be precise, there are no true “universal” solutions. But this is a highly theoretical idea and we don’t need to consider it for the purposes of this book.)

Some AI researchers argue that today’s artificial neural networks are already universal. A neural network can be trained to play Go or drive a car. However, the same neural network can’t do both. Neural networks also have to be tweaked and modified in other ways to get them to perform a task. When I use the terms “universal” or “general-purpose,” I imagine something like ourselves: a machine that can learn to do many things without erasing its memory and starting over.

There are two reasons AI will transition from the dedicated solutions we see today to more universal solutions that will dominate the future. The first is the same reason that universal computers won out over dedicated computers. Universal computers are ultimately more cost-effective, and this led to more rapid advances in the technology. As more and more people use the same designs, more effort is applied to enhancing the most popular designs and the ecosystems that support them, leading to rapid improvements in cost and performance. This was the underlying driver of the exponential increase in computing power that shaped industry and society in the latter part of the twentieth century. The second reason that AI will transition to universal solutions is that some of the most important future applications of machine intelligence will require the flexibility of universal solutions. These applications will need to handle unanticipated problems and devise novel solutions in a way that today’s dedicated deep learning machines cannot.

Consider two types of robots. The first robot paints cars in a factory. We want car-painting robots to be fast, accurate, and unchanging. We don’t want them trying new spraying techniques each day or questioning why they are painting cars. When it comes to painting cars on an assembly line, single-purpose, unintelligent robots are what we need. Now say we want to send a team of robot construction workers to Mars to build a livable habitat for humans. These robots need to use a variety of tools and assemble buildings in an unstructured environment. They will encounter unforeseen problems and will need to collaboratively improvise fixes and modify designs. Humans can handle these types of issues, but no machine today is close to doing any of this. Mars construction robots will need to possess general-purpose intelligence.

You might think that the need for general-purpose intelligent machines will be limited, that most AI applications will be addressed with dedicated, single-purpose technologies like we have today. People thought the same thing about general-purpose computers. They argued that the commercial demand for general-purpose computers was limited to a few high-value applications. The opposite turned out to be true. Due to dramatic reductions in cost and size, general-purpose computers became one of the largest and most economically important technologies of the last century. I believe that general-purpose AI will similarly dominate machine intelligence in the latter part of the twenty-first century. In the late 1940s and early 1950s, when commercial computers first became available, it was impossible to imagine what their applications would be in 1990 or 2000. Today, our imagination is similarly challenged. No one can know how intelligent machines will be used fifty or sixty years from now.

When Is Something Intelligent?

When should we consider a machine intelligent? Is there a set of criteria we can use? This is analogous to asking, When is a machine a general-purpose computer? To qualify as a general-purpose computer—that is, a universal Turing machine—a machine needs certain components, such as memory, a CPU, and software. You can’t detect these ingredients from the outside. For example, I can’t tell if my toaster oven has a general-purpose computer inside or a custom chip. The more features my toaster oven has, the more likely it contains a general-purpose computer, but the only sure way to tell is by looking inside and seeing how it works.

Similarly, to qualify as intelligent, a machine needs to operate using a set of principles. You can’t detect whether a system uses these principles by observing it from the outside. For example, if I see a car driving down the highway, I can’t tell if it is being driven by an intelligent human who is learning and adapting as they drive or by a simple controller that just keeps the car between two lines. The more complex the behavior exhibited by the car, the more likely it is that an intelligent agent is in control, but the only sure way to tell is by looking inside.

So, is there a set of criteria that machines must have to be considered intelligent? I think so. My proposal for what qualifies as intelligent is based on the brain. Each of the four attributes in the following list is something we know that the brain does and that I believe an intelligent machine must do too. I will describe what each attribute is, why it is important, and how the brain implements it. Of course, intelligent machines may implement these attributes differently than a brain. For example, intelligent machines don’t have to be made of living cells.

Not everyone will agree with my choice of attributes. One can make a good argument that I have left some important things out. That’s OK. I view my list as a minimum, or baseline, for AGI. Few AI systems have any of these attributes today.

1. Learning Continuously

What is it? Every waking moment of our entire life, we are learning. How long we remember something varies. Some things are forgotten quickly, such as the arrangement of dishes on a table or what clothes we wore yesterday. Other things will stay with us for our entire lives. Learning is not a separate process from sensing and acting. We learn continuously.

Why is it important? The world is constantly changing; therefore, our model of the world must learn continuously to reflect the changing world. Most AI systems today do not learn continuously. They go through a lengthy training process and when it is complete, they are deployed. This is one reason they are not flexible. Flexibility requires continuously adjusting to changing conditions and new knowledge.

How does the brain do it? The most important component of how brains learn continuously is the neuron. When a neuron learns a new pattern, it forms new synapses on one dendrite branch. The new synapses don’t affect previously learned ones on other branches. Thus, learning something new doesn’t force the neuron to forget or modify something it learned earlier. The artificial neurons used in today’s AI systems don’t have this ability. This is one reason they can’t learn continuously.

2. Learning via Movement

Report Page