Why ChatGPT Is Not Reliable

July 5, 2023/138 Comments/in Computer Science Tutorials, Education Articles, Technology Guides/by PeterDonis

Estimated Read Time: 3 minute(s)

Common Topics: chatgpt, words, training, information, data

I’ll start with the simple fact: ChatGPT is not a reliable answerer of questions.

To try to explain why from scratch would be a heavy lift, but fortunately, Stephen Wolfram has already done the heavy lifting for us in his article, “What is ChatGPT Doing… and Why Does It Work?” [1] In a PF thread discussing this article, I tried to summarize as briefly as I could the key message of Wolfram’s article. Here is what I said in my post there [2]:

ChatGPT does not make use of the meanings of words at all. All it is doing is generating text word by word based on relative word frequencies in its training data. It is using correlations between words, but that is not the same as correlations in the underlying information that the words represent (much less causation). ChatGPT literally has no idea that the words it strings together represent anything.

In other words, ChatGPT is not designed to actually answer questions or provide information. In fact, it is explicitly designed not to do those things, because, as I said in the quote above, it only works with words in themselves; it does not work with, and does not even have any concept of, the information that the words represent. And that makes it unreliable, by design.

So, to give some examples of misconceptions that I have encountered: when you ask ChatGPT a question that you might think would be answerable by a Google Search, ChatGPT is not doing that. When you ask ChatGPT a question that you might think would be answerable by looking in a database (as Wolfram Alpha, for example, does when you ask it something like “what is the distance from New York to Los Angeles?”), ChatGPT is not doing that. And so on, for any value of “which you might think would be answerable by…”. And the same is true if you substitute “looking for information in its training data” for any of the above: the fact that, for example, there are a huge body of posts on Instagram in ChatGPT’s training data does not mean that if you ask it a question about Instagram posts, it will look at those posts in its training data and analyze them in order to answer the question. It won’t. While there is, of course, voluminous information in ChatGPT’s training data for a human reader, ChatGPT does not use, or even comprehend, any of that information. Literally all it gets from its training data is relative word frequencies.

So why do ChatGPT responses seem like they are reliable? Why do they seem like they must be coming from a process that “knows” the information involved? Because our cognitive systems are designed to interpret things that way. When we see text that looks syntactically, grammatically correct and seems like it is confidently asserting something, we assume that it must have been produced, if not by an actual human, at least by an “AI” that is generating the text based on some kind of actual knowledge. In other words, ChatGPT fools our cognitive systems into attributing qualities to it that it does not actually have.

This security hole, if you will, in our cognitive systems is not a recent discovery. Human con artists have made use of much the same tricks throughout human history. The only difference with the human con artists is that they were doing it intentionally, whereas ChatGPT has no intentions at all and is doing it as a side effect of its design. But the end result is much the same: let the reader beware.

[1] https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

[2] https://www.physicsforums.com/threads/stephen-wolfram-explains-how-chatgpt-works.1050431/post-6903906

PeterDonis

Completed Educational Background: MIT Master’s

Favorite Area of Science: Relativity

138 replies

Newer Comments »

PeterDonis says:
July 17, 2023 at 9:45 am
PeroK said
there are worse ways to get answers.
So what? That doesn't make ChatGPT good enough to rely on.
Log in to Reply
PeroK says:
July 17, 2023 at 9:42 am
PeterDonis said
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions.
I don't personally intend to, no. But, there are worse ways to get answers.
Log in to Reply
Vanadium 50 says:
July 17, 2023 at 9:38 am
PeroK said
You are too focused, IMO, on how it does things and not what it does.
Could you make the same argument for astrology? Yesterday it told me to talk to a loved one and it worked!
Log in to Reply
PeterDonis says:
July 17, 2023 at 9:32 am
PeroK said
In a practical sense, you could live according to what answers ChatGPT gives you.
For your sake I sincerely hope you don't try this. Unless, of course, you only ask it questions whose answers you don't really care about anyway and aren't going to use to determine any actions. Particularly any actions that involve risk of harm to you or others.
PeroK said
Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters.
Sure it is. You can ask it questions in natural language about everyday matters and it gives you answers, if the answers are in its databases. Unlike ChatGPT, it "knows" when it doesn't know an answer and tells you so. ChatGPT doesn't even have the concept of "doesn't know", because it doesn't even have the concept of "know". All it has is the relative word frequencies in its training data, and all it does is produce a "continuation" of the text you give it as input, according to those relative word frequencies.
Granted, Wolfram Alpha doesn't communicate its answers in natural language, but the answers are still understandable. Plus, it also includes in its answers the assumptions it made while parsing your natural language input (which ChatGPT doesn't even do at all–not just that it doesn't include any assumptions in its output, but it doesn't even parse its input). For example, if you ask Wolfram Alpha "what is the distance from New York to Los Angeles", it includes in its answer that it assumed that by "New York" you meant the city, not the state.
PeroK said
You are too focused, IMO, on how it does things and not what it does.
Huh? The Insights article under discussion, and the Wolfram article it references, are entirely about what ChatGPT does, and what it doesn't do. Wolfram also goes into some detail about the "how", but the "what" is the key part I focused on.
Log in to Reply
PeroK says:
July 17, 2023 at 9:22 am
PeterDonis said
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.
Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.
Only for the same reason a rock doesn't.
In a practical sense, you could live according to what answers ChatGPT gives you. Wolfram Alpha is a mathematical engine. It's not able to communicate on practical everyday matters. Nor can a rock.
How any software works is not really the issue if you are an end user. The important thing is what it outputs.
You are too focused, IMO, on how it does things and not what it does.
Log in to Reply
PeterDonis says:
July 17, 2023 at 9:15 am
PeroK said
Chat GPT, if anything, is more reliable in terms of its objective assessment of the world
ChatGPT does not have any "objective assessment of the world". All it has is the relative word frequencies in its training data.
Wolfram Alpha, ironically, would be a much better thing to describe with the phrase you use here. It actually does contain a database (more precisely multiple databases with different entry and lookup criteria) with validated information about the world, which it uses to answer questions.
PeroK said
Chat GPT doesn't have gross political, religious or philosophical prejudices.
Only for the same reason a rock doesn't.
Log in to Reply
PeroK says:
July 17, 2023 at 6:41 am
I'm not convinced that human intelligence is so effective. This site in many ways is a gross misrepresentation of human thought and interactions. For all the right reasons! Go anywhere else on the Internet or out in the street, as it were, and there is little or no connection between what people think and believe and objective evidence.
Chat GPT, if anything, is more reliable in terms of its objective assessment of the world than the vast majority of human beings.
Chat GPT doesn't have gross political, religious or philosophical prejudices.
If you talked to an Oil Company Executive, then there was no climate change and the biggest threat to humanity was the environmental movement.
Most humans beings deliberately lie if it is in their interests. With Chat GPT at least you know it isn't deliberately lying to you.
I don't know where AI is going, or where we are heading, but I could make a case that Chat GPT is more rational, intelligent and truthful than 99% of the people on this planet.
Log in to Reply
Vanadium 50 says:
July 16, 2023 at 6:41 pm
What it means "to know" is philosophy.
However, an epistomologist would say that an envelope that contaiend the phrase "It is after 2:30 and before 2:00" does not posess knowledgem eve though it is correct about as often as ChatGPT.
Log in to Reply
PeterDonis says:
July 14, 2023 at 10:35 pm
bob012345 said
Here are a few.
https://stealthoptional.com/tech/chatgpt-paid-vs-free/
https://www.wepc.com/tips/is-chat-gpt-plus-worth-it/
https://www.wired.com/story/chatgpt-plus-web-browsing-openai/
https://www.businessinsider.com/chatgpt-plus-free-openai-paid-version-chatbot-2023-2
Thanks! It looks like, at the very least, the paid version includes searching the Internet for actual answers to prompts, so it is not the same thing as the free version that my Insights article (and the Wolfram article it references) discuss.
Log in to Reply
PeterDonis says:
July 14, 2023 at 9:49 pm
bob012345 said
I know someone who has the paid version and says it's a lot more reliable.
Is there any reference online about this paid version and how it differs from the free version?
Log in to Reply
Vanadium 50 says:
July 9, 2023 at 6:43 pm
I'm not even sure how you could measure uncertainty in the output based on word frequency. "Some people say Aristotle was Beligian" will throw it off.
Log in to Reply
PeterDonis says:
July 9, 2023 at 12:30 pm
Oscar Benavides said
It also tries to err on the side of providing an answer
It doesn't even "try"–it will always output text in response to a prompt.
Oscar Benavides said
even when it has no idea what the right answer is
It never does, since it has no "idea" of any content at all. All it has any "idea" of is relative word frequencies.
Log in to Reply
Vanadium 50 says:
July 8, 2023 at 12:09 pm
That's just a statement that you can pre-train your program on a large number of questions. I've already said it was much slower than real time. It doesn't make any difference to what the program does. It does, however, make a difference to the illusion of intelligence,.
As discussed, ChatGPT doesn't even try to output what is correct. It tries to output what is written often. There is some home that there is a correlation between that and correctness, but that's not always true and it was not hard to come up with examples.
ChatGPT is the love child of Clever Hans and the Mechanical Turk.
Log in to Reply
Vanadium 50 says:
July 7, 2023 at 6:31 pm
If you want to argue that the difference between then and now is that hardware has gotten cheaper, you should argue that. But the ideas themselves are old. As I said, I was there.
Log in to Reply
Vanadium 50 says:
July 7, 2023 at 5:30 pm
@AndreasC , I was doing it 40 years ago.
Log in to Reply
Vanadium 50 says:
July 7, 2023 at 4:32 pm
First, I was serious. And stop calling me Shirley.
Second. the problem with discussing "AI", much less its purpose, is that it is such a huge area, lumping it all together is seldom helpful. Personally I feel that the most interesting work has been done in motion, balance and sensors.
Third, we had this technology almost 40 years ago. That was based on letters, not words, and it was much slower than real-time. And nobody got excited.
Log in to Reply
russ_watters says:
July 7, 2023 at 2:12 pm
I do agree with @Vanadium 50 (if he wasn't kidding) that it has good use cases for low risk, low expectation purposes like customer service bots, but that's a really low performance bar*. I do agree with @PeterDonis that if, for example, this was rolled-out by Apple as an upgrade to Siri we wouldn't be having this conversation. It's way, way less interesting/important than the hype suggests.
….and this Insight addresses an important but not well discussed problem that more to the point is why we frown upon chat-bot questions and answers on PF.
*Edit: Also, this isn't what AI is "for". AI's promise is in being able to solve problems that are currently out of reach of computers but don't even require conscious thought by people. These problems – such as self-driving cars – are often ones where reliability is important.
edit2: Ok, I say that, but I can't be so sure it's true, particularly because of wildcards like Elon Musk who are ~~eager~~ willing to put the public at risk to test experimental software.
Log in to Reply
PeterDonis says:
July 7, 2023 at 2:06 pm
AndreasC said
Most people care about the result. Of course it has some limitations that are fundamental, and they don't necessarily want people knowing that.
You're contradicting yourself. The "limitations that are fundamental" are crucial effects on the result. They're not just irrelevant side issues.
Log in to Reply
PeterDonis says:
July 7, 2023 at 1:54 pm
AndreasC said
what happens when some business or gover does the math and figures it would rather risk being wrong than pay experts?
If I know that's what your business is doing, you won't get my business.
I suspect that a lot of people feel this way; they just don't know that that's what the business is doing. Certainly OpenAI has not done anything to inform the public of what ChatGPT is actually doing, and not doing. I suspect that is because if they did do so, interest in what OpenAI is doing would evaporate.
Log in to Reply
russ_watters says:
July 7, 2023 at 1:53 pm
BWV said
But what if you want the answer as if given by Homer Simpson, or a Shakespearian Sonnet? Alpha cant do that ;)
I think they already do it the Max Power way:

Log in to Reply
PeterDonis says:
July 7, 2023 at 12:13 pm
AndreasC said
If what you want is the answer to an exercise, and it can give you the correct answer, say, 99% of the time, then that's good enough for many people and in many contexts
Is it?
Perhaps if my only purpose is to get a passing grade on the exercise, by hook or by crook, this would be good enough.
But for lots of other purposes, it seems wrong. It's not even a matter of percentage accuracy; it's a matter of what the thing is doing and not doing, as compared with what my purpose is. If my purpose is to actually understand the subject matter, I need to learn from a source that actually understands the subject matter. If my purpose is to learn a particular fact, I need to learn from a source that will respond based on that particular fact. For example, if I ask for the distance from New York to Chicago, I don't want an answer from a source that will generate text based on word frequencies in its input data; I want an answer from a source that will look up that distance in a database of verified distances and output what it finds. (Wolfram Alpha, for example, does this in response to queries of that sort.)
Log in to Reply
Vanadium 50 says:
July 7, 2023 at 12:01 pm
russ_watters said
go back again to wondering what the creators are thinking about this…
I think they are planning to monetize this by first making a name for themselves and then selling a product where "close enough is good enough". For example, customer service chatbots.
Log in to Reply
russ_watters says:
July 7, 2023 at 11:15 am
Vanadium 50 said
"ChatGPT Airlines – now 96% of our takeoffs have landings at airports!"
"New from OceanGate: now 99% Reliable – Twice as Reliable as our Previous Subs!"
(too soon?)
Vanadium 50 said
It's not just unreliable – we have no reason to believe it should be reliable, or that this approach will ever be reliable.
I go back again to wondering what the creators are thinking about this…
pbuk said
Definitely not [AI], but they believe they are headed in the right direction:
OpenAI's website is really weird. It is exceptionally thin on content and heavy on flash, with most of the front page just being pointless slogans and photos of people doing office things (was it created by ChatGPT?). It even features a video on top that apparently has no sound? All this to sell a predominantly text-based application (ironic)? The first section of the front page, though, contains one actual piece of information, in slogan form:
"Creating safe AGI that benefits all of humanity"
That's quite an ambitious goal/claim. It's not surprising that everyday people believe it's more than it really is, when that's what the company is saying.
The trajectory of the app and the way they've talked about the flaws such as hallucinations does imply they think their approach is viable and that refinements that improve its reliability should result in it becoming "reliable enough". Ironically this may increase the risk/danger of misuse, as people apply it to more and more situations where reliability should matter. I can't see how this approach would ever be acceptable for industrial automation. Maybe for a toy drone it won't matter if it unexpectedly/unpredictably crashes for no apparent reason "only" 0.1% of the time, but that won't ever be acceptable for a self driving car or airplane.
Log in to Reply
Vanadium 50 says:
July 7, 2023 at 9:54 am
If there were any knowledge base behind ChatGPT you would be able to
Train it in English
Train it in French
Train it in domain knowledge (like physics)
Have it answer questions about thus domain in French.
It can't do this. There is no there there.
Log in to Reply
PeterDonis says:
July 7, 2023 at 9:47 am
AndreasC said
the only reason it passed tests was because of the "laziness and ignorance of the testers", presumably not because the answers were accurate
Or because the testers didn't bother writing a good test, that actually can distinguish between ChatGPT, an algorithm that generates text based on nothing but relative word frequencies in its training data, and an actual human with actual human understanding of the subject matter. The test is supposed to be testing for the latter, so if the former can pass the test, the test is no good.
AndreasC said
the only reason it passed tests was because graders were "lazy"
See above.
AndreasC said
it only passed SAT tests because they can be "gamed"
Which, as I said, is already well known: that humans can pass SAT tests without having any actual knowledge of the topic areas. For example, they can pass the SAT math test without being able to actually use math to solve real world problems–meaning, by gathering information about the problem, using that information to set up relevant mathematical equations, then solving them. So in this case, ChatGPT is not going beyond human performance in any respect.
Log in to Reply
PeterDonis says:
July 7, 2023 at 9:26 am
AndreasC said
previous computer programs weren't good at, say, parsing natural language and taking into account context, subjective meaning etc. That is why it is impressive
ChatGPT is not parsing natural language. It might well give the appearance of doing so, but that's only an appearance. The text it outputs is just a continuation of the text you input, based on relative word frequencies in its training data. It does not break up the input into sentence structures or anything like that, which is what "parsing natural language" would mean. All it does is output continuations of text based on word frequencies.
Log in to Reply
Vanadium 50 says:
July 7, 2023 at 9:17 am
russ_watters said
"New 'Spreadsheet' Program 'VisiCalc' Boasts 96% Accuracy – Might it be the New Killer App?"
"ChatGPT Airlines – now 96% of our takeoffs have landings at airports!"
Let's go back to "knowledge". Yes, it's philosophical, but some of the elements can be addressed scientifically. An old-fashioned definition of knowledge was "justified true belief". Let's dispense with "belief" as too fuzzy, Is what ChatGPT says true? Sometimes. As stated, 96% of the time is not very impressive. Is it justified? Absolutely not – it "knows" onlt what words others used, and in what order. That's it.
In no sense is there "knowledge" there.
It's not just unreliable – we have no reason to believe it should be reliable, or that this approach will ever be reliable.
Log in to Reply
PeterDonis says:
July 7, 2023 at 9:14 am
AndreasC said
I have very explicitly said I do NOT believe it is reliable multiple times.
But in post #13 you also said it can "repeatably" give accurate answers to questions. That seems to contradict "unreliable". I asked you about this apparent contradiction in post #15 and you haven't responded.
Log in to Reply
Demystifier says:
July 7, 2023 at 9:12 am
PeterDonis said
What you show here is nothing like what AndreasC described.
Exactly!
Log in to Reply
PeterDonis says:
July 7, 2023 at 9:02 am
Demystifier said
I've just tried it
What you show here is nothing like what AndreasC described.
Log in to Reply
russ_watters says:
July 7, 2023 at 8:11 am
AndreasC said
Sure, but the thing is, that it is able to do tasks that previous computer programs couldn't do.
You could write that on the box of any new piece of software. Otherwise there's no reason to use it. But you're seeing the point now:
AndreasC said
…previous computer programs weren't good at, say, parsing natural language and taking into account context, subjective meaning etc. That is why it is impressive, because it accurately and quickly performs tasks that computers couldn't previously do, and were solely the domain of humans.
Right. What's impressive about it is that it can converse with a human and sound pretty human. But now please reread the title of the thread. "Sounds human" is a totally different accomplishment from "reliable".
Log in to Reply
russ_watters says:
July 7, 2023 at 7:40 am
AndreasC said
People often post more when it gets something wrong. For instance, people have given it SAT tests:
https://study.com/test-prep/sat-exam/chatgpt-sat-score-promps-discussion-on-responsible-ai-use.html
Your take is weird to me, but it seems common, especially in the media. Consider this potential headline from 1979:
"New 'Spreadsheet' Program 'VisiCalc' Boasts 96% Accuracy – Might it be the New Killer App?"
[ChatGPT was 96th percentile on the SAT, not accuracy, but close enough.]
That's not impressive, it's a disaster. It's orders of magnitude worse than acceptable accuracy from a computer. It seems that because ChatGPT sounds confidently human people have lowered the bar from "computer" to "human" in judging its intelligence – and don't even realize they've done it. That's a dangerous mistake.
Log in to Reply
Demystifier says:
July 7, 2023 at 2:27 am
PeterDonis said
Please give a reference: where has this been done?
I've just tried it:
Me: What are the customs of the aliens at the planet Xuxor?
ChatGPT:
As an AI language model, I don't have real-time information on specific planets or extraterrestrial civilizations outside of what is available up until my knowledge cutoff in September 2021. Therefore, I'm not aware of any specific customs of aliens on a planet called Xuxor or any other extraterrestrial civilization.
It's important to note that the existence of aliens or their customs on specific planets is purely speculative at this point, as we have not made contact with extraterrestrial life. If and when we do encounter extraterrestrial civilizations, their customs and behaviors would likely be vastly different from our own, given the vastness of the universe and the potential for diverse forms of life.
Until further information becomes available, any details about the customs of aliens on specific planets would be purely fictional or based on imagination, as depicted in science fiction literature, movies, or other creative works.
Log in to Reply
Demystifier says:
July 7, 2023 at 2:23 am
PeterDonis said
Is it? How would one even determine that?
By asking experts to give their educated guess estimates, say two years ago.
Log in to Reply
PeterDonis says:
July 6, 2023 at 11:25 pm
AndreasC said
you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues.
Please give a reference: where has this been done?
Log in to Reply
PeterDonis says:
July 6, 2023 at 11:24 pm
russ_watters said
It seems that through statistical analysis it is able to build a model that approximates or simulates real information.
Yes, because while the information that is contained in the relative word frequencies in the training data is extremely sparse compared to the information that a human reader could extract from the same data, it is still not zero information. There is information contained in those word frequencies. For example, "Thomas Jefferson" is going to appear correlated with "july 4, 1776" in the training data to a much greater degree than "Will Smith" does.
russ_watters said
I can't imagine this is an oversight
It's not; it was an intentional feature of the design that only the relative word frequencies in the training data would be used. The designers, from what I can tell, actually believe that piling up enough training data with such word frequencies can lead to actual "knowledge" of subject matter.
Log in to Reply
PeterDonis says:
July 6, 2023 at 11:19 pm
AndreasC said
people have given it SAT tests
This just shows that SAT tests can be gamed. Which we already knew anyway.
Log in to Reply
russ_watters says:
July 6, 2023 at 4:15 pm
AndreasC said
This isn't even what surprises me that much. You could say that it has learned that the correct date follows these prompts. But the thing is, you can make up an alien planet, tell gpt about it and their customs, and it will answer understanding questions on your text, plus it may even manage to infer when their alien independence day is, given enough clues. It's really impressive.
Impressive how? Doesn't this just tell you that it doesn't know the difference between fiction and reality, and more to the point, there's no way for you to know if it is providing you fictional or real answers*?
*Hint: always fictional.
Log in to Reply
kith says:
July 6, 2023 at 3:33 pm
FWIW, here are the results of ChatGPT taking two university-level exams in labor economics and quantum computing (graded and commented on by the lecturers who created the exams):
https://betonit.substack.com/p/chatgpt-takes-my-midterm-and-gets (GPT-3, D)
https://betonit.substack.com/p/gpt-retakes-my-midterm-and-gets-an (GPT-4, A)
https://scottaaronson.blog/?p=7209 (GPT-4, B)
Log in to Reply
russ_watters says:
July 6, 2023 at 1:35 pm
Demystifier said
In my view, the right question is not why ChatGPT is not reliable. Given the general principles how it works, the right question is: Why is it more reliable than one would expect?
PeterDonis said
Is it? How would one even determine that?
I think it's just a qualitative feeling, but I feel the same way. When first learning about it, it never occurred to me that it didn't access stored information (either its own or 3rd party) to form its replies*. Now that I know it doesn't, it surprises me that it gets so much right. If it's just doing word association and statistical analysis, I'm surprised that asking about Independence Day doesn't return "On July 4, 1776 Will Smith fought a group of alien invaders before signing the Declaration of Independence in Philadelphia…" It seems that through statistical analysis it is able to build a model that approximates or simulates real information. To me, surprisingly well.
*I don't know the intent of the designers, but I can't imagine this is an oversight. Maybe the intent was always to profit from 3rd parties using it as an interface for their data sources (some of which they are doing it appears)?
But whatever the real goals of the company, I think it is wrong and risky that it's been hyped (whether by the media or the company) to make people think that it is a general purpose AI with real knowledge. As a result, people have their guard down and are likely to mis/over-use it.
I wonder if the developers really believe it qualifies for the title "AI" or that complexity = intelligence?
Log in to Reply
russ_watters says:
July 6, 2023 at 1:05 pm
I suspect ChatGPT has infiltrated this thread…
Edit: Btw, While I'm not certain of this, here's how I can often tell: it's the lack of focus in the responses. When the content is dumped into the middle of an ongoing conversation, it doesn't acknowledge or respond to the ongoing conversation, it just provides generic information that is often not useful for/connected to the discussion.
Log in to Reply
phinds says:
July 6, 2023 at 12:20 pm
pintudaso said
Limited context understanding
That is incorrect. "Limited understanding" implies that there is at least SOME understanding but chatGPT has zero understanding of anything.
Log in to Reply
PeterDonis says:
July 6, 2023 at 11:58 am
AndreasC said
Try it. Feed it questions which have unambiguous answers. You'll see that even though sometimes it generates nonsense, very, VERY frequently it gives right answers.
This does not seem consistent with many posts here at PF by people who have tried ChatGPT and posted the output. The general sense I get from those posts is that ChatGPT was less reliable than they expected–because they did not realize what it is actually doing and not doing. For example, apparently many people expected that when you asked it a factual question about something in its training data, it would go look in its training data to find the answer. But it doesn't, even if the right answer is in its training data. Wolfram's article, referenced in my Insights article, makes all this clear.
Log in to Reply
PeterDonis says:
July 6, 2023 at 11:54 am
AndreasC said
The semantic connections you are talking about are connections between sensory inputs and pre-existing structure inside our brains.
Not necessarily pre-existing. We build structures in our brains to represent things in the world as a result of our interactions with them. ChatGPT does not. (Nor does ChatGPT have any "pre-existing" structures that are relevant for this.)
AndreasC said
Imagine a student passing one test after another, would you take someone telling you they only "got lucky" seriously
If the reason they passed was that their graders were lazy and didn't actually check the accuracy of the answers, yes. And that is exactly what has happened in cases where ChatGPT supposedly "passed" tests. If you think graders would never be so lazy, you have led a very sheltered life. It's just a more extreme version of students getting a passing grade on a book report without ever having read the book, and I can vouch for that happening from my own personal experience.
AndreasC said
It can give you correct answers to unambiguous questions that are non-trivial and that it has not specifically encountered before in its training. And it can do that a lot, repeatably.
Please produce your evidence for this claim. It is contrary to both the analysis of how ChatGPT actually works, which I discuss in the Insights article, and the statements of many, many people who have used it. Including many posts here at PF where people have given ChatGPT output that is confident-sounding but wrong.
AndreasC said
ChatGPT is unreliable because we observe it to be unreliable.
Doesn't this contradict your claim quoted above?
AndreasC said
That requires no explanation.
The fact that it is observed to be unreliable is just a fact, yes. But in previous discussions of ChatGPT here at PF, it became clear to me that many people do not understand how ChatGPT works and so do not understand both that it is unreliable and why it is unreliable. That is why I wrote this article.
AndreasC said
What does require explanation is why, as @Demystifier said, it is so much more reliable (especially at non trivial, "reasoning" type problems) than you would naively expect.
And I have already responded to @Demystifier that such a claim is meaningless unless you can actually quantify what "you would naively expect" and then compare ChatGPT's actual accuracy to that. Just saying that subjectively it seems more accurate than you would expect is meaningless.
Log in to Reply
PeterDonis says:
July 6, 2023 at 10:59 am
Rive said
how many people might live off on the same principles? Confidence tricking through most communication?
Yes, I think one way of describing ChatGPT is that it is crudely simulating a human con artist: it produces statements that seem to come from an entity that is knowledgeable, but actually don't.
Log in to Reply
PeterDonis says:
July 6, 2023 at 10:58 am
Demystifier said
Why is it more reliable than one would expect?
Is it? How would one even determine that?
Log in to Reply
PeterDonis says:
July 6, 2023 at 10:57 am
AndreasC said
We definitely learn about lots of things by just reading about them…
That's because our minds have semantic connections between words and things in the world. When we read words, we make use of those connections–in other words, we know that the words have meanings, and what those meanings are. If we get the meanings of words wrong, we "learn" things that are wrong.
ChatGPT has none of this. It has no connections between words and anything else. It doesn't even have the concept of there being connections between words and anything else. The only information it uses is relative word frequencies in its training data.
AndreasC said
It can already give accurate answers about a wide range of questions
No, it can't. It can get lucky sometimes and happen to give an "answer" that happens to be accurate, but, as you will quickly find out if you start looking, it also happily gives inaccurate answers with the same level of confidence. That's because it's not designed to give accurate answers to questions; that's not what it's for.
AndreasC said
pass tests
Only because the "tests" are graded so poorly that even the inaccurate but confident-sounding responses that ChatGPT gives "pass" the tests. That is a reflection of the laziness and ignorance of the test graders, not of the knowledge of ChatGPT.
AndreasC said
answer new problems it has not been specifically trained on
Sure, because it can generate text in response to any prompt whatever. But the responses it gives will have no reliable relationship to reality. Sometimes they might happen to be right, other times they will be wrong, often egregiously wrong. But all of the responses seem just as confident.
AndreasC said
I always thought somebody knows something if they can not only recall the facts, but also apply them in new contexts.
ChatGPT does not and cannot do these things. What it does do is, as a side effect of its design, produce text that seems, to a naive observer, to be produced by something that does these things. But the illusion is quickly shattered when you start actually checking up on its responses.
Log in to Reply
Demystifier says:
July 6, 2023 at 3:41 am
In my view, the right question is not why ChatGPT is not reliable. Given the general principles how it works, the right question is: Why is it more reliable than one would expect? I think even the creators of it were surprised how good it was.
Log in to Reply
PeterDonis says:
July 5, 2023 at 3:21 pm
AndreasC said
How do we know at what point it "knows" something? There are non-trivial philosophical questions here
Perhaps, but they are irrelevant to this article. The article is not about an abstract philosophical concept of "knowledge". It is about what ChatGPT is and is not actually doing when it emits text in response to a prompt.
AndreasC said
I can see someone eventually arguing they have somehow formed a decent representation of what things "are" inside them
Not as long as there are no semantic connections between the network and the world. No entity forms "representations" of actual things just by looking at relative word frequencies in texts. There has to be two-way interaction with the actual world. That's how, for example, we humans form our mental representations of things. We interact with them and learn how they work.
Log in to Reply
Bystander says:
July 5, 2023 at 8:08 am
Call it what it is, "Artificial 'William's Syndrome.'" https://www.google.com/search?q=wil…99i465i512.19081j1j7&sourceid=chrome&ie=UTF-8
…, pre-politically correct characteristics included "often precocious vocabulary with no apparent 'real understanding/ability' for use/application/reasoning." That is my recollection from Googling ten-fifteen years ago; ymmv.
This is
https://www.physicsforums.com/threa…an-appropriate-source-for-discussion.1053525/
another/one more case; some wiki/google sources lack "shelf life."
Log in to Reply

Newer Comments »

Want to join the discussion?
Feel free to contribute!

Why ChatGPT Is Not Reliable

"Creating safe AGI that benefits all of humanity"

Leave a Reply

Leave a Reply Cancel reply

You might also like

"Creating safe AGI that benefits all of humanity"​

Leave a Reply

Leave a Reply Cancel reply

"Creating safe AGI that benefits all of humanity"