S Parrot 2

Fabien Roger

VeritableCB

the gears to ascension

M Ls

New Comment

SUBMIT

11 comments, sorted by top scoring

Click to highlight new comments since: Today at 1:00 AM

[-]

One concern I have is that there are many claims here about what was or was not present in the training data. We don't know what training data GPT-4 used, and it's very plausible that, for instance, lots of things that GPT-3 and GPT-3.5 were asked were used in training, perhaps even with custom, human written answers. (You did mention that you don't know exactly what it was trained on, but there's still an implicit reliance. So mostly I'm just annoyed that OpenAI isn't even open about the things that don't pose any plausible risks, such as what they train on.)

And this is not to say I disagree - I think the post is correct. I just worry that many of the claims aren't necessarily possibly to justify.

[-]

I agree. However, I doubt that the examples from argument 4 are in the training, I think this is the strongest argument. The different scenario came out of my mind and I didn't find any study / similar topic research with the same criteria as in the appendix (I didn't search a lot though).

[-]

I agree that, tautologically, there is some implicit model that enables the LLM to infer what will happen in the case of the ball. I also think that there is a reasonably strong argument that whatever this model it, it in some way maps to "understanding of causes" - but also think that there's an argument the other way, that any map between the implicit associations and reality is so convoluted that almost all of the complexity is contained within our understanding of how language maps to the world. This is a direct analog of Aaronson's "Waterfall Argument" - and the issue is that there's certainly lots of complexity in the model, but we don't know how complex the map between the model and reality is - and because it routes through human language, the stochastic parrot argument is, I think, that the understanding is mostly contained in the way humans perceive language.

[-]

Fabien Roger

I think the links to the playground are broken due to the new OAI playground update.

[-]

Thanks for the catch!

[-]

-1

True, but you can always wriggle out saying that all of that doesn't count as "truly understanding". Yes, LLM's capabilities are impressive, but does drawing SVG changes the fact that somewhere inside the model all of these capabilities are represented by "mere" number relations?

Do LLM's "merely" repeat the training data? They do, but do they do it "merely"? There is no answer, unless somebody gives a commonly accepted criterion of "mereness".

The core issue with that is of course that since no one has a more or less formal and comprehensive definition of "truly understanding" that everyone agrees with - you can play with words however you like to rationalize whatever prior you had about LLM.

Substituting one vaguely defined concept of "truly understanding" with another vaguely defined concept of a "world model" doesn't help much. For example, does "this token is often followed by that token" constitutes a world model? If not - why not? It is really primitive, but who said world model has to be complex and have something to do with 3D space or theory of mind to be a world model? Isn't our manifest image of reality also a shadow on the wall since it lacks "true understanding" of underlying quantum fields or superstrings or whatever in the same way that long list of correlations between tokens is a shadow of our world?

The "stochastic parrot" argument has been an armchair philosophizing from the start, so no amount of evidence like that will convince people that take it seriously. Even if LLM-based AGI will take over the world - the last words of such a person gonna be "but that's not true thinking". And I'm not using that as a strawman - there's nothing wrong with a priori reasoning as such, unless you doing it wrong.

I think the best response to "stochastic parrot" is asking three questions:

1. What is your criterion of "truly understanding"? Answer concretely in a terms of the structure or behavior of the model itself and without circular definitions like "having a world model" which is defined as "conscious experience" and that is defined as "feeling redness of red" etc. Otherwise the whole argument becomes completely orthogonal to any reality at all.

2. Why do you think LLM's do not satisfy that criterion and human brain does?

3. Why do you think it is relevant to any practical intents and purposes, for example to the question "will it kill you if you turn it on"?

[-]

VeritableCB

I don't think this line of argumentation is actually challenging the concept of stochastic parroting on a fundamental level. The ability of generative ML to create images or solve math problems or engage in speculation about stories, etc, were all known to the researchers who coined the term; these things you point to, far from challenging the concept of stochastic parrots, are assumed to be true by these researchers.

When you point to these models not understanding how reciprocal relationships between objects work, but apologize for it by reference to its ability to explain who Tom Cruise's mother is, I think you miss an opportunity to unpack that. If we imagine LLMs as stochastic parrots, this is a textbook example: the LLM cannot make a very basic inference when presented with novel information. It only gets this "right" when you ask it about something that's already been written about in its training data many times: a celebrity's mother.

The model is very excellent at reproducing reasoning that it has been shown examples of: Tom Cruise has a mother, so we can reason that his mother has son named Tom Cruise. For your sound example, there is information about how sound propagation works on the internet for the model to draw on. But could the LLM speculate on some entirely new type of physics problem that hasn't been written about before and fed into its model? How far can the model move laterally into entirely new types of reasoning before it starts spewing gibberish or repeating known facts?

You could fix a lot of these problems. I have no doubt that at some point they'll work out how to get ChatGPT to understand these reciprocal relationships. But the point of that critique isn't to celebrate a failure of the model and say it can never be fixed, the point is to look at these edge cases to help understand what's going on under the hood: the model is replicating reasoning it's seen before, and yes, that's impressive, but it cannot reliably employ reasoning to truly novel problem types because it is not reasoning. You may not find that troubling, and that's your prerogative, truly, but I do think it would be useful for you to grapple with the idea that your arguments are compatible with the stochastic parrots concept, not a challenge to them.

[-]

the gears to ascension

the new OAI update has deployed a GPT4 version which was trained with vision, GPT4-turbo. not sure if that changes anything you're saying.

[-]

M Ls

I agree with the other comments here suggesting that working hard enough on an animals' language patterns in LLMs will develop models of the animals' worlds based on that language use, and so develop better contexted answers in these reading comprehension questions. With no direct experience of the world.

The SVG stuff is an excellent example of there being available explicit short cuts in the data set. Much of that language use by humans and their embodied world/worldview/worldmaking is is not that explicit. To arrive at that tacit knowledge is interesting.

If beyond the stochastic parrot, now or soon, are we at the stage of stochastic maker of organ-grinders and their monkeys? (Who can churn out explicit lyrics about the language/grammar animals and their avatars use to build their worlds/markets. )

If so there may be a point where we are left asking, Who is master, the monkey or the organ? And thus we miss the entire point?

Poof. The singularity has left us behind wondering what that noise was.

Are we there yet?

[-]

I partially agree. I think stochastic parrot-ness is a spectrum. Even humans behave as stochastic parrots sometimes (for me it's when I am tired). I think, though that we don't really know what an experience of the world really is, and so the only way to talk about it is through an agent's behaviors. The point of this post is that SOTA LLM are probably farther in the spectrum than most people expect (My impression from experience is that GPT4 is ~75% of the way between total stochastic parrot and human). It is better than human in some task (some specific ToM experience like the example in argument 2), but still less good in others (like at applying nuances. It can understand them, but when you want it to actually be nuanced when it acts, you only see the difference when you ask for different stuff). I think it is important to build a measure for stochastic parrot ness as this might be an useful metric for governance and a better proxy for "does it understand the world it is in?" (which I think is important for most of the realistic doom scenarios). Also, these experiences are a way to give a taste of what LLM psychology look like.

[-]