I like how we took something computers were masters at doing, and somehow fucked it up.
in reply to Jesus Margar

somewhere. And sometimes it’s possible to start it from a known and fixed constant, and get the same results for the same prompt every time. (You can do
this with some of the image generation models and invokeai iirc.) But in larger systems and longer interactions even with a fixed PRNG seed the path taken through the PRNG space matters, and small perturbations in it can create large changes in outcome
in reply to Miss Gayle

the purpose of LLMs is to mimic human speech, not to give correct answers. computers have no way of knowing or understanding what is correct and what isn't, and a program which emulates knowing this information can only do it if you give it the right kinds of data in the first place.

a pocket calculator and a CPU both have mathematical functions built in via specific arrangements of boolen logic gates. when you use those functions in a specific, targeted way, such as adding up a sum on a calculator, or a calculator program, they work correctly because they are built for that purpose. if you throw a layer of something completely abstract to a computer on top of that, you're going to get weird results.

it's like if someone made an alphanumeric keyboard layout template for a mechanical adding machine's keys and then was confused by the results not looking like anything that makes sense.

this is why the meaning of Life, the Universe, and Everything is 42.

in reply to Mastokarl 🇺🇦

I got a similar result. But could get back to the wrong results when "pressing" ChatGPT that its answer was wrong.
infosec.exchange/@realn2s/1146…

Actually, I find the different results even more worrying. A consistent error could be "fixed" but random error are much harder or impossible to fix (especially if they are an inherent propertiies of the system/LLMs)


Just for fun i asked ChatGPT the same question and now the answer is "correct" (it was wrong but it "corrected" itself)

Funny enough, when pressing it that it was wrong and the right answer was 0.21 I got this

in reply to Corinna (versiffte Göre)

Man muss aber dazu auch sagen, dass diese Textgeneratoren ja quasi die "Schmuddelecke" der KI darstellen. Das ist jetzt halt Hype, es gibt aber daneben auch durchaus seriöse Gebiete... die dann aber keine Milliarden verbrennen. Das typische Beispiel sind "Expertensysteme" welche strickt logisch vorgehen und nichts lernen.
in reply to Christian Berger DECT 2763

wem sagst du das... Wir haben eine Ideensammlung für Einsatz von KI in der Firma. Von mir kam Einsatz von Texterkennung, um immer gleiche Zählprotokolle zwar noch händisch zu prüfen, aber danach den Stapel nur noch in den Scanner zu werfen und nicht auch noch händisch abzutippen, sowie ne SVM für ne spezielle Sache. Was wird getestet? KI, die Meetings mitschneidet und eine Zusammenfassung ausspuckt. Selbstverfreilich nicht gut gegliedert und im Thema hin und her springend...
in reply to D. Olifant

The defenses to this like, "Yo, it's natural language it's not a calculator."

It's running on a computer. It's not like it can't hand off a calculation. And if it can't, why isn't that built in? Or else, why isn't it going, "Sorry, that's a math problem and I can't do math. I am, alas, only a poor computer who can't do math."

You literally don't have to invent or hallucinate math at all. It doesn't have to engage higher pattern-matching functions other than to deduce, "Oh shit, you're asking a math question, let's do 10.12 < 10.6 and see if that's true or false."

This is the answer machine that is supposed to replace us and take our jobs, so we can absolutely criticize it when it confidently declares utterly and confidently wrong answers to stuff its "unintelligent" predecessors did in calculator form just fine.

in reply to D. Olifant

I'm guessing we should just assume its understanding or ability to produce comprehensibility about physics or chemistry questions, even though those should be public domain kinds of shit that doesn't even have ethical concerns which, much like math, isn't always going to have variable responses. Like whether it's safe for a human to inhale pure argon gas or not.
in reply to D. Olifant

sigh Yes, I know you got a different result when you tried it. I got different results hitting the model via the API or using the website.

That's a whole other issue. You're expecting consistent results. Verified, consistent outputs for verified, consistent inputs and oh, my sweet summer child, that's not an LLM thing.

reshared this

in reply to D. Olifant

@pjakobs Well …
A NUMBER 9.9 (read as 9.90) is bigger as 9.11

When it’s version Numbers? Sure 9.11 is the higher version compared to 9.9

If it’s a date?
In USA it would be September 9th vs 11th so 9.11 is higher.
In Germany it’s 9th of September vs 9th of November so 9.11 is higher.

LLMs are good in some things but not the answer to everything. Don’t expect it to do Math right without context. 🤷‍♂️

Unknown parent

mastodon - Link to source

Mastokarl 🇺🇦

I assume the guy who came up with the stochastic parrot metaphor is very embarrassed by it by now. I would be.

(Completely ignoring the deep concept building that those multi-layered networks do when learning from vast datasets, so they stochastically work on complex concepts that we may not even understand, but yes, parrot.)

in reply to Mastokarl 🇺🇦

But you're evidently gullible enough to have fallen for the grifter's proposition that the text strings emerging from a stochastic parrot relate to anything other than the text strings that went into it in the first place: we've successfully implemented Searle's Chinese Room, not an embodied intelligence.

en.wikipedia.org/wiki/Chinese_…

(To clarify: I think that a general artificial intelligence might be possible in principle: but this ain't it.)

This entry was edited (6 months ago)

Claudius Link reshared this.

in reply to Charlie Stross

Agree. I'm more and more convinced that today's chatbots are just an advanced version of ELIZA, fooling the users and just appearing intelligent

en.wikipedia.org/wiki/ELIZA

I wrote a thread about it infosec.exchange/@realn2s/1117…

where @dentangle fooled me using the ELIZA technics

in reply to Charlie Stross

I'm not sure of the "difference"
Different in pure dimension for sure (molehill vs mountain).

On a higher level:

ELIZA used keywords with a rank which, together with the relations to the output sequences were hardcoded in the source.

LLM use tokens with a probability which, together with the relations to the output tokes sequences are determined though training data

Closing with a anecdote from the wiki page:

Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."

This entry was edited (6 months ago)
Unknown parent

mastodon - Link to source

Claudius Link

It absolutely does!

Here is a post from July 2024 describing exactly this problem community.openai.com/t/why-9-1…

I fail to be astonished or call something intelligent if fails to do correct math in the numerical range up to 10 (even after one year, many training cycles, ...)

Unknown parent

mastodon - Link to source

Claudius Link

I agree, the technics behind the training and the construction of the response are vastly different.

Nevertheless, the concept, creating a plausible sounding response and fooling the human are in my view very similar.

In a way that is very disappointing. 60 years later and an increase of computing power by a factor of billons the solution concept of "AI" is still pretending 😞

Unknown parent

mastodon - Link to source

Alys

"maybe it's ok to polish a text that isn't too important" - My feeling is that if the text isn't too important, it doesn't need much polishing, and a human should do any polishing necessary anyway. Then later when the human has to polish text that is absolutely critical to get right, the human has had practice at polishing and does it well.

@airshipper @kevinriggle @oli

in reply to Charlie Stross

Thought about this some more. No, I do not think that using the Chinese room thought experiment you will ever accept a computer program‘s behavior as intelligent, no matter how much evidence for its intelligent behavior you get. Because per definition there‘s an algorithm executing it.

I don’t agree, because I don‘t buy into the human exceptionalism that we meat machines have some magic inside of us that gives us intent the machines can‘t have.

Unknown parent

mastodon - Link to source

Anko Brand Ambassador 🎇

people have been mistaking "statistics" and "algorithms" and "procedural generation" and "fuzzy logic" for intelligence for a long while, I guess!

For me a working definition is; humans are intelligent, humans can learn from each other. Animals even, have a level of intelligence, they learn from each other.

Large language models? They don't learn from each other. If you use the output of one model to train another, the new model gets *worse* not smarter, you get model collapse. Not intelligence, still just statistics.

Unknown parent

mastodon - Link to source

Mastokarl 🇺🇦

that these intelligence tests have not been part of the training sets of the LLMs. The machines do well on tests that I find intellectually challenging. B) okay, personal anecdotal experience is always a great proof, but still: I have played with Eliza back then. Got boring after a short time. OTOH, I recently had an LLM (Google Gemini 2.5 pro, in case you‘re interested) develop a puzzle based on but very different to Tetris. To my best google skills nobody else has written this…
Unknown parent

mastodon - Link to source

Mastokarl 🇺🇦

count the n-character-tuples (e.g. all 3-character sequences), and then produce text not entirely unlike English by choosing the next character that completes the most probable tuple. This is what I think about when talking about stochastical parrots. My program clearly had no clue of what it generates, it will complete „Quee“ to „Queen“ or „Queer“ without having any clue what these words mean…
Unknown parent

mastodon - Link to source

Mastokarl 🇺🇦

Of the maybe 1500 lines of code, less then 10 were mine. Understanding a spec that it never has come across and turning it into good, working code is something I fail to attribute to anything but intelligence.

Knowledge representation: Okay, another personal story, sorry. Long ago when PC didn‘t mean „x86 architecture“, I read about statistical text generation and wrote a program that would take a longer text, …

in reply to Mastokarl 🇺🇦

flavor of Tetris puzzle before. I gave the LLM a 3 page specification of the game, asked it to ask questions if something is not clear. And after some very, well, intelligent questions, the questions a skilled developer would ask, to clarify bits that I had not specified well, it generated the game for me, and after a few refinements the game was working beautifully, in the technology I wanted with the scope I wanted...
Unknown parent

mastodon - Link to source

Mastokarl 🇺🇦

In my book this is far beyond what I would call stochastic parroting (although, all the weights in the NN are in the end used in a stochastic process. Would you not agree that a system that clearly has a sophisticated semantic representation of a huge number of concepts is, in representing knowledge, intelligent? …
in reply to Mastokarl 🇺🇦

LLMs use a deep neural net to learn the meaning of tokens, words, utterances. Meaning = they can represent concepts and their relationship to other concepts. Playing around with embedding can show this nicely. The vector for „Queen“ and „Mercury“ will be closer than the vector for „Queen“ and „Hydrogen“ (not really tried this 😀 ). So an LLM has a sophisticated representation of complex concepts that it is using to generate text. …
in reply to Mastokarl 🇺🇦

Production of content. Yes, very clearly, there‘s not much intelligent about the algorithm that completes a token sequence by selecting (based on temperature) a plausible next token to continue the token sequence. But you should not ignore that all the learned concepts are part of the (stochastic) process to do this. I‘m getting very speculative now, but isn’t this also how we work? Say a sentence and try to skip 4 words. Remember how you need to mentally fast forward a song…
in reply to Mastokarl 🇺🇦