Dear friends,
Do large language models understand the world? As a scientist and engineer, I’ve avoided asking whether an AI system “understands” anything. There’s no widely agreed-upon, scientific test for whether a system really understands — as opposed to appearing to understand — just as no such tests exist for consciousness or sentience, as I discussed in an earlier letter. This makes the question of understanding a matter of philosophy rather than science. But with this caveat, I believe that LLMs build sufficiently complex models of the world that I feel comfortable saying that, to some extent, they do understand the world.
To me, the work on Othello-GPT is a compelling demonstration that LLMs build world models; that is, they figure out what the world really is like rather than blindly parrot words. Kenneth Li and colleagues trained a variant of the GPT language model on sequences of moves from Othello, a board game in which two players take turns placing game pieces on an 8x8 grid. For example, one sequence of moves might be d3 c5 f6 f5 e6 e3…, where each pair of characters (such as d3) corresponds to placing a game piece at a board location.
During training, the network saw only sequences of moves. It wasn’t explicitly told that these were moves on a square, 8x8 board or the rules of the game. After training on a large dataset of such moves, it did a decent job of predicting what the next move might be.
The key question is: Did the network make these predictions by building a world model? That is, did it discover that there was an 8x8 board and a specific set of rules for placing pieces on it, that underpinned these moves? The authors demonstrate convincingly that the answer is yes. Specifically, given a sequence of moves, the network’s hidden-unit activations appeared to capture a representation of the current board position as well as available legal moves. This shows that, rather than being a “stochastic parrot” that tried only to mimic the statistics of its training data, the network did indeed build a world model.
While this study used Othello, I have little doubt that LLMs trained on human text also build world models. A lot of “emergent” behaviors of LLMs — for example, the fact that a model fine-tuned to follow English instructions can follow instructions written in other languages — seem very hard to explain unless we view them as understanding the world.
AI has wrestled with the notion of understanding for a long time. Philosopher John Searle published the Chinese Room Argument in 1980. He proposed a thought experiment: Imagine an English speaker alone in a room with a rulebook for manipulating symbols, who is able to translate Chinese written on paper slipped under the door into English, even though the person understands no Chinese. Searle argued that a computer is like this person. It appears to understand Chinese, but it really doesn’t.
A common counterargument known as the Systems Reply is that, even if no single part of the Chinese Room scenario understands Chinese, the complete system of the person, rulebook, paper, and so on does. Similarly, no single neuron in my brain understands machine learning, but the system of all the neurons in my brain hopefully do. In my recent conversation with Geoff Hinton, which you can watch here, the notion that LLMs understand the world was a point we both agreed on.
Although philosophy is important, I seldom write about it because such debates can rage on endlessly and I would rather spend my time coding. I’m not sure what the current generation of philosophers thinks about LLMs understanding the world, but I am certain that we live in an age of wonders!
Okay, back to coding.
Keep learning,
Andrew