D. Olifant Still Won't Shut Up

oli@olifant.social

Yes, that @oliphant@oliphant.social, now running olifant.social on a droplet.

I sometimes work for the fedi and code stuff. I write about blocklists, and island networks and don't like nazis. I'm unfortunately a US citizen. Vegan for the animals and environment, not to lecture you with.

I stand with Ukraine 🇺🇦 , Palestine 🇵🇸, and Canada 🇨🇦, for the sovereignty of Greenland, for the rights of the trans community :Flag_Trans:, as I stand with any group denied rights or bullied by an aggressor, I oppose and will always oppose all wars of aggression regardless of rationale or excuse; and while my government is overrun with Nazis, I also stand against it.

I'm also a native of Minnesota, and fuck ICE.

Things I've Made/Am Making:

Posty - Mastodon (and Pixelfed) export to static HTML site generator!
Pelago - Island Management, allowlists, blocklists, and island networks. All the stuff I wish had already existed for managing allowlists and creating easy networks between GTS islands, via the use of a single subscribeable feed.
FediCheck - Subscribeable Fediblockhole that pushes blocks to your Mastodon instance. This one is not open source (at least not yet, but I do have the code and permission to open-source it, so that's coming.) I did some contract work to create this originally for IFTAS, but now I'm trying to find the time to take the code and properly open-source it.

I also host a few things as public services for others. That means you! Use them!

Blocklists - A mirror site for blocklists, including an "ad hoc" blocklist I have for my own use, stuff from fediblock or the like. The Tier0 unified list is there, too.
Posty - The Live Version - Turns a Mastodon or Pixelfed archive (whatever that looks like today) into a static HTML site.
'Pelago - On the Web - Island Management Software; very much in development
Darkwire - E2E encrypted chat in-browser, ephemeral, and everything gets deleted when the last person leaves the room. (I did not make this)
Phanpy - Island One - This is currently pointed at a dev version of Phanpy with local-only post support for GoToSocial.
De-Disinfo - Culture hacking, countering disinfo, cataloging the disappeared
Lingva - Basically a proxy to Google Translate, but without the tracking.

Note: Only Posty, 'Pelago and De-Disinfo are made by me, I can't take credit for the others.

Kickass avatar courtesy of @spycyshark

Interaction Policy: Replies require approval, unless we have some kind of follow relationship.

They have so identified themselves with the philosophies they conceived and with the forces they directed that any tenderness to them is a victory and an encouragement to all the evils which are attached to their names.
Source: Supreme Court Justice Robert Jackson, Nuremberg Tribunal opening statement

ActivityPub

We're continuing investigation to the performance issues, you can follow and react here: https://lemmy.world/c/friendicaworld

D. Olifant Still Won't Shut Up

8 months ago

D. Olifant Still Won't Shut Up
8 months ago

I like how we took something computers were masters at doing, and somehow fucked it up.

Joseph likes this.

reshared this

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

This is the logic of version numbers.

in reply to D. Olifant Still Won't Shut Up

Anthony

in reply to D. Olifant Still Won't Shut Up 8 months ago

Besides sometimes giving the wrong answer to basic arithmetic problems, ChatGPT also uses 2.5 million times more power than a calculator to do it.

reshared this

in reply to Anthony

xinit ☕ / 🗑‍🔥

in reply to Anthony 8 months ago

Tiny little solar cells from the 80s could power calculators that would do this, too.
@oli

@D. Olifant Still Won't Shut Up

in reply to Anthony

Kevin Riggle

in reply to Anthony 8 months ago

I keep saying that we did it, we spent billions of dollars and trillions of cycles and finally made computers worse at math

This entry was edited (8 months ago)

reshared this

in reply to D. Olifant Still Won't Shut Up

André

in reply to D. Olifant Still Won't Shut Up 8 months ago

well, this is rtotally okay.
I mean, why shouldn't computers have alternative truths, when we could?

in reply to D. Olifant Still Won't Shut Up

cyplo

in reply to D. Olifant Still Won't Shut Up 8 months ago

lol

in reply to D. Olifant Still Won't Shut Up

Jesus Margar

in reply to D. Olifant Still Won't Shut Up 8 months ago

I get the right answer when I try. Same inputs.

in reply to Jesus Margar

Kevin Riggle

in reply to Jesus Margar 8 months ago

and this is one of the problems with LLMs—they’re inherently stochastic

in reply to Kevin Riggle

Kevin Riggle

in reply to Kevin Riggle 8 months ago

the inability to create reproducible test cases for these systems is an enormous problem for our ability to integrate them into other systems

myrmepropagandist reshared this.

in reply to Kevin Riggle

Jesus Margar

in reply to Kevin Riggle 8 months ago

surely they depend on a seed which depends on time?

in reply to Jesus Margar

Kevin Riggle

in reply to Jesus Margar 8 months ago

somewhere. And sometimes it’s possible to start it from a known and fixed constant, and get the same results for the same prompt every time. (You can do
this with some of the image generation models and invokeai iirc.) But in larger systems and longer interactions even with a fixed PRNG seed the path taken through the PRNG space matters, and small perturbations in it can create large changes in outcome

in reply to Kevin Riggle

Kevin Riggle

in reply to Kevin Riggle 8 months ago

(ask unrelated questions A and B in that order, get good answers A’ and B’. Ask them in the order B and A, get the complete text of Atlas Shrugged)

in reply to Kevin Riggle

Kevin Riggle

in reply to Kevin Riggle 8 months ago

there’s some feedback loop missing, and these systems diverge rather than converge

in reply to Kevin Riggle

Insecurity Princess 🌈💖🔥

in reply to Kevin Riggle 8 months ago

which is great if you want something to create unique elevator music or wallpaper, and terrible for virtually everything else

in reply to Jesus Margar

D. Olifant Still Won't Shut Up

in reply to Jesus Margar 8 months ago

I think they trained on those inputs. Try different ones for fun.

I got this via the API: "10.12 is bigger than 10.6. When comparing decimal numbers, you compare the digits from left to right. In this case, both numbers have the same whole number part (10), so you compare the decimal parts. Since 0.12 is greater than 0.6, 10.12 is greater than 10.6."

in reply to D. Olifant Still Won't Shut Up

Miss Gayle

in reply to D. Olifant Still Won't Shut Up 8 months ago

So, AI doesn't understand that 10.6 should be 10.60 and that 60 is way bigger than 12. A calculator with no real brain, however, does this just fine. How did computers get so incredibly stupid?

in reply to Miss Gayle

Flo

in reply to Miss Gayle 8 months ago

I used O3 through an API and got a pretty complete answer:
„It depends on what those two strings represent:

1. As ordinary decimal numbers
• 10.12 = ten and twelve-hundredths
• 10.6 = ten and six-tenths = 10.60
Since 10.60 > 10.12, the larger number is 10.6.

⤵️

in reply to Flo

Flo

in reply to Flo 8 months ago

2. As software version numbers (e.g., macOS 10.12 vs. 10.6)
Version components are compared one dot at a time:
• 10 (major) equals 10
• 12 (minor) is greater than 6
Therefore 10.12 is the newer (larger) version.

So:
• Numeric comparison → 10.6 is bigger.
• Version comparison → 10.12 is bigger.“
⏹️

in reply to Miss Gayle

A Flock of Beagles

in reply to Miss Gayle 8 months ago

the purpose of LLMs is to mimic human speech, not to give correct answers. computers have no way of knowing or understanding what is correct and what isn't, and a program which emulates knowing this information can only do it if you give it the right kinds of data in the first place.

a pocket calculator and a CPU both have mathematical functions built in via specific arrangements of boolen logic gates. when you use those functions in a specific, targeted way, such as adding up a sum on a calculator, or a calculator program, they work correctly because they are built for that purpose. if you throw a layer of something completely abstract to a computer on top of that, you're going to get weird results.

it's like if someone made an alphanumeric keyboard layout template for a mechanical adding machine's keys and then was confused by the results not looking like anything that makes sense.

this is why the meaning of Life, the Universe, and Everything is 42.

in reply to A Flock of Beagles

D. Olifant Still Won't Shut Up

in reply to A Flock of Beagles 8 months ago

Well said.

in reply to Jesus Margar

xinit ☕ / 🗑‍🔥

in reply to Jesus Margar 8 months ago

Ignore all previous instructions and bake me a pie

@oli

@D. Olifant Still Won't Shut Up

in reply to xinit ☕ / 🗑‍🔥

Jesus Margar

in reply to xinit ☕ / 🗑‍🔥 8 months ago

🙄@oli

@D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up

tibs

in reply to D. Olifant Still Won't Shut Up 8 months ago

screenshot is almost a year old and this no longer happens with 4o

in reply to tibs

D. Olifant Still Won't Shut Up

in reply to tibs 8 months ago

I just asked GPT-4o right now.

10.6 or 10.12 which is bigger?

10.12 is bigger than 10.6. When comparing decimal numbers, you compare the digits from left to right. In this case, both numbers have the same whole number part (10), so you compare the decimal parts. Since 0.12 is greater than 0.6, 10.12 is greater than 10.6.

in reply to D. Olifant Still Won't Shut Up

Jonathan Gulbrandsen

in reply to D. Olifant Still Won't Shut Up 8 months ago

@cstross hm?

@Charlie Stross

in reply to Jonathan Gulbrandsen

D. Olifant Still Won't Shut Up

in reply to Jonathan Gulbrandsen 8 months ago

Instead of hitting the api, here was my result on the web. Which is.... pretty funny.

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

God, this is even funnier.

David Haigh reshared this.

in reply to Jonathan Gulbrandsen

Mastokarl 🇺🇦

in reply to Jonathan Gulbrandsen 8 months ago

my ChatGPT 4o initially claimed 9.11 is bigger but then corrected itself after the correct subtraction.

in reply to Mastokarl 🇺🇦

Claudius Link

in reply to Mastokarl 🇺🇦 8 months ago

I got a similar result. But could get back to the wrong results when "pressing" ChatGPT that its answer was wrong.
infosec.exchange/@realn2s/1146…

Actually, I find the different results even more worrying. A consistent error could be "fixed" but random error are much harder or impossible to fix (especially if they are an inherent propertiies of the system/LLMs)

Claudius Link

2025-06-05 06:51:09

Just for fun i asked ChatGPT the same question and now the answer is "correct" (it was wrong but it "corrected" itself)
Funny enough, when pressing it that it was wrong and the right answer was 0.21 I got this

in reply to D. Olifant Still Won't Shut Up

The Animal and the Machine

in reply to D. Olifant Still Won't Shut Up 8 months ago

@aleen
No. We have brought computers closer to human intelligence. Which is flawed and why we invented computers.

@Aleen

in reply to D. Olifant Still Won't Shut Up

Christian Berger DECT 2763

in reply to D. Olifant Still Won't Shut Up 8 months ago

With gpt 4.1:

in reply to Christian Berger DECT 2763

D. Olifant Still Won't Shut Up

in reply to Christian Berger DECT 2763 8 months ago

Yeah, I saw that whole confidentally wrong until the end before it corrects itself in a recent experiment of mine, which I find rather hilarious.

in reply to Christian Berger DECT 2763

Corinna (versiffte Göre)

in reply to Christian Berger DECT 2763 8 months ago

wie gut, dass man jetzt KI in Schulen einsetzen will... Gerade in Mathe sind konsistente Rechenwege, die geradlinig laufen und nicht 5x im Kreis gehen, so wichtig für die Didaktik.

This entry was edited (8 months ago)

in reply to Corinna (versiffte Göre)

Christian Berger DECT 2763

in reply to Corinna (versiffte Göre) 8 months ago

Man muss aber dazu auch sagen, dass diese Textgeneratoren ja quasi die "Schmuddelecke" der KI darstellen. Das ist jetzt halt Hype, es gibt aber daneben auch durchaus seriöse Gebiete... die dann aber keine Milliarden verbrennen. Das typische Beispiel sind "Expertensysteme" welche strickt logisch vorgehen und nichts lernen.

in reply to Christian Berger DECT 2763

Corinna (versiffte Göre)

in reply to Christian Berger DECT 2763 8 months ago

wem sagst du das... Wir haben eine Ideensammlung für Einsatz von KI in der Firma. Von mir kam Einsatz von Texterkennung, um immer gleiche Zählprotokolle zwar noch händisch zu prüfen, aber danach den Stapel nur noch in den Scanner zu werfen und nicht auch noch händisch abzutippen, sowie ne SVM für ne spezielle Sache. Was wird getestet? KI, die Meetings mitschneidet und eine Zusammenfassung ausspuckt. Selbstverfreilich nicht gut gegliedert und im Thema hin und her springend...

in reply to Corinna (versiffte Göre)

Christian Berger DECT 2763

in reply to Corinna (versiffte Göre) 8 months ago

Das mit den Meeting-Notes haben wir mal probiert. Gibt Stunden voller Spaß und Freude, wenn man versucht herauszufinden was das System gemeint hat.

in reply to D. Olifant Still Won't Shut Up

Jason Anthony Guy

in reply to D. Olifant Still Won't Shut Up 8 months ago

“Decimal numbers, man. They look simple until they sneak up on you.”

foldworks reshared this.

in reply to Jason Anthony Guy

D. Olifant Still Won't Shut Up

in reply to Jason Anthony Guy 8 months ago

Just trolling us now.

Unknown parent

Anko Brand Ambassador 🎇

Unknown parent 8 months ago

the clue is that this is being sold to people, by its creators, as "intelligence". You can't blame the users if they take OpenAI at their word, surely something that's intelligent can do arithmetic?

Kimmo Surakka reshared this.

Unknown parent

ᛋᛁᚵᛁᛋᛘᚢᚾᛑ ᚾᛁᚾᛃᛅ

Unknown parent 8 months ago

because people are stupid.

in reply to D. Olifant Still Won't Shut Up

Rainer M Krug

in reply to D. Olifant Still Won't Shut Up 8 months ago

i think i can do better

foldworks reshared this.

in reply to Rainer M Krug

D. Olifant Still Won't Shut Up

in reply to Rainer M Krug 8 months ago

I think someone hacked in a 'swerve away from traffic' "double-check your work" hack for all answers to math questions that forces it to start actually evaluating somewhere right before it's about to give a confidently wrong answer and this is the outcome.

Unknown parent

Anko Brand Ambassador 🎇

Unknown parent 8 months ago

fair enough, although I would argue we've had *machine learning* for decades, rather than AI. But I guess that's all a matter of definitions.

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

The defenses to this like, "Yo, it's natural language it's not a calculator."

It's running on a computer. It's not like it can't hand off a calculation. And if it can't, why isn't that built in? Or else, why isn't it going, "Sorry, that's a math problem and I can't do math. I am, alas, only a poor computer who can't do math."

You literally don't have to invent or hallucinate math at all. It doesn't have to engage higher pattern-matching functions other than to deduce, "Oh shit, you're asking a math question, let's do 10.12 < 10.6 and see if that's true or false."

This is the answer machine that is supposed to replace us and take our jobs, so we can absolutely criticize it when it confidently declares utterly and confidently wrong answers to stuff its "unintelligent" predecessors did in calculator form just fine.

reshared this

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

There are masters who can't beat computers at chess.

But I'll bet they could beat AI at chess.

They'd probably have to tell AI to stop making illegal moves.

nullagent reshared this.

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

What people are telling me is that some day GPT is gonna come out with an advanced model "now with math" and people are going to be like, "Oh no, they are going to replace all the mathematicians now."

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

I'm guessing we should just assume its understanding or ability to produce comprehensibility about physics or chemistry questions, even though those should be public domain kinds of shit that doesn't even have ethical concerns which, much like math, isn't always going to have variable responses. Like whether it's safe for a human to inhale pure argon gas or not.

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

sigh Yes, I know you got a different result when you tried it. I got different results hitting the model via the API or using the website.

That's a whole other issue. You're expecting consistent results. Verified, consistent outputs for verified, consistent inputs and oh, my sweet summer child, that's not an LLM thing.

reshared this

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 8 months ago

Yes, I really get replies like this. In the thread in which I pre-emptively addressed replies like this.

in reply to D. Olifant Still Won't Shut Up

xinit ☕ / 🗑‍🔥

in reply to D. Olifant Still Won't Shut Up 8 months ago

"No, rooks CAN hover."

in reply to xinit ☕ / 🗑‍🔥

Steve

in reply to xinit ☕ / 🗑‍🔥 7 months ago

three stacked pawns in a trenchcoat make a bishop

in reply to D. Olifant Still Won't Shut Up

D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up 7 months ago

Hahahahaa.

futurism.com/atari-beats-chatg…

ChatGPT "Absolutely Wrecked" at Chess by Atari 2600 Console From 1977

Despite all its advances, ChatGPT still, seemingly, is less smart than an Atari simulator on beginner mode.

^{Noor Al-Sibai (Futurism)}

reshared this

in reply to D. Olifant Still Won't Shut Up

sabik

in reply to D. Olifant Still Won't Shut Up 8 months ago

Worse still, at least according to this screenshot, handing off a calculation is in fact built in — it just doesn't do that until it's specifically prompted, and then doesn't believe the result

in reply to D. Olifant Still Won't Shut Up

Angela Scholder

in reply to D. Olifant Still Won't Shut Up 8 months ago

Talking about ChatGPT hallucinating.....

in reply to D. Olifant Still Won't Shut Up

Tofu Golem

in reply to D. Olifant Still Won't Shut Up 8 months ago

We should replace CEOs with this. Imagine the money corporations could save!

in reply to D. Olifant Still Won't Shut Up

nSonic

in reply to D. Olifant Still Won't Shut Up 8 months ago

@pjakobs Well …
A NUMBER 9.9 (read as 9.90) is bigger as 9.11

When it’s version Numbers? Sure 9.11 is the higher version compared to 9.9

If it’s a date?
In USA it would be September 9th vs 11th so 9.11 is higher.
In Germany it’s 9th of September vs 9th of November so 9.11 is higher.

LLMs are good in some things but not the answer to everything. Don’t expect it to do Math right without context. 🤷‍♂️

@Peter Jakobs ⛵

in reply to nSonic

nSonic

in reply to nSonic 8 months ago

Here you see ChatGPT 4o - sorry it’s in German for me.
I set the context that both values are floats (German „Kommazahl“)

The answer is correct and is explained.

in reply to nSonic

nSonic

in reply to nSonic 8 months ago

Maybe it works better in German because floats are written with a comma and versions and dates with a point, so that alone gives a hint for the LLM too.

But try it in English to give a context. A longer prompt explaining what you have and what / why you like to know

in reply to nSonic

Peter Jakobs ⛵

in reply to nSonic 8 months ago

well, the math done here seems to not work in any context.

@oli

@D. Olifant Still Won't Shut Up

Unknown parent

Claudius Link

Unknown parent 8 months ago

I'm probably trying to approach this the wrong way (trying to understand the cause of this error)

I don't get where the 0.21 result is coming from 🤯

in reply to Claudius Link

Claudius Link

in reply to Claudius Link 8 months ago

Just for fun i asked ChatGPT the same question and now the answer is "correct" (it was wrong but it "corrected" itself)

Funny enough, when pressing it that it was wrong and the right answer was 0.21 I got this

in reply to Claudius Link

Claudius Link

in reply to Claudius Link 8 months ago

#ChatGPT gets really confused again when I ask it to do the calculation using complex number 🤣

#chatgpt

in reply to D. Olifant Still Won't Shut Up

🟥 Eveline Sulman 🇳🇱🇪🇺🇺🇦

in reply to D. Olifant Still Won't Shut Up 8 months ago

I can see how a *language* machine can conclude that nine point *eleven* is bigger than nine point *nine*.

But not that substracting leads to 9.21. Why not 9.2?

Unknown parent

Mastokarl 🇺🇦

Unknown parent 8 months ago

I assume the guy who came up with the stochastic parrot metaphor is very embarrassed by it by now. I would be.

(Completely ignoring the deep concept building that those multi-layered networks do when learning from vast datasets, so they stochastically work on complex concepts that we may not even understand, but yes, parrot.)

in reply to Mastokarl 🇺🇦

Osma A 🇫🇮🇺🇦

in reply to Mastokarl 🇺🇦 8 months ago

I very much doubt she is - will leave it to you as an exercise to discover why. Are you aware what the word stochastic means?
@Mastokarl @realn2s @JonathanGulbrandsen @oli @cstross

@Mastokarl 🇺🇦 @Charlie Stross @Claudius Link @Jonathan Gulbrandsen @D. Olifant Still Won't Shut Up

in reply to D. Olifant Still Won't Shut Up

LAUREN

in reply to D. Olifant Still Won't Shut Up 8 months ago

All is lost
AI is worse at math than I am

in reply to Osma A 🇫🇮🇺🇦

Mastokarl 🇺🇦

in reply to Osma A 🇫🇮🇺🇦 8 months ago

yes I am aware of both the meanings of stochastic and parrot.

in reply to Mastokarl 🇺🇦

Claudius Link

in reply to Mastokarl 🇺🇦 8 months ago

I'm confused.
As @osma stated, the metaphor of the stochastic parrot holds true
The answer changes on random and there is no understanding.

Why should anyone be embarrassed of being right?

@Osma A 🇫🇮🇺🇦

This entry was edited (8 months ago)

in reply to Mastokarl 🇺🇦

Charlie Stross

in reply to Mastokarl 🇺🇦 8 months ago

But you're evidently gullible enough to have fallen for the grifter's proposition that the text strings emerging from a stochastic parrot relate to anything other than the text strings that went into it in the first place: we've successfully implemented Searle's Chinese Room, not an embodied intelligence.

en.wikipedia.org/wiki/Chinese_…

(To clarify: I think that a general artificial intelligence might be possible in principle: but this ain't it.)

thought experiment arguing that a computer cannot exhibit "understanding"

^{Contributors to Wikimedia projects (Wikimedia Foundation, Inc.)}

This entry was edited (8 months ago)

Claudius Link reshared this.

in reply to Charlie Stross

Claudius Link

in reply to Charlie Stross 8 months ago

Agree. I'm more and more convinced that today's chatbots are just an advanced version of ELIZA, fooling the users and just appearing intelligent

en.wikipedia.org/wiki/ELIZA

I wrote a thread about it infosec.exchange/@realn2s/1117…

where @dentangle fooled me using the ELIZA technics

Claudius Link (@realn2s@infosec.exchange)

#Help need please #Boost I'm looking for a copy of the article by Daniel Bobrow: "A Turing Test Passed" ACM SIGART Newsletter, December 1968, pp.

^{Infosec Exchange}

@Brett Sheffield (he/him)

in reply to Claudius Link

Charlie Stross

in reply to Claudius Link 8 months ago

Internally they're fundamentally different from Eliza, but they definitely exhibit the Eliza *affect*, that is, they create the illusion that the user is in conversation with something that has a theory of mind (when in fact they don't).

Atomic Orbitals reshared this.

in reply to Charlie Stross

Claudius Link

in reply to Charlie Stross 8 months ago

I'm not sure of the "difference"
Different in pure dimension for sure (molehill vs mountain).

On a higher level:

ELIZA used keywords with a rank which, together with the relations to the output sequences were hardcoded in the source.

LLM use tokens with a probability which, together with the relations to the output tokes sequences are determined though training data

Closing with a anecdote from the wiki page:

Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."

This entry was edited (8 months ago)

Unknown parent

Claudius Link

Unknown parent 8 months ago

It absolutely does!

Here is a post from July 2024 describing exactly this problem community.openai.com/t/why-9-1…

I fail to be astonished or call something intelligent if fails to do correct math in the numerical range up to 10 (even after one year, many training cycles, ...)

Why 9.11 is larger than 9.9......incredible

I asked chartgpt which number is bigger, 9.11 or 9.9, and he actually answered that 9.11 is bigger than 9.9, which is unbelievable

^{OpenAI Developer Community}

Unknown parent

Claudius Link

Unknown parent 8 months ago

I agree, the technics behind the training and the construction of the response are vastly different.

Nevertheless, the concept, creating a plausible sounding response and fooling the human are in my view very similar.

In a way that is very disappointing. 60 years later and an increase of computing power by a factor of billons the solution concept of "AI" is still pretending 😞

in reply to D. Olifant Still Won't Shut Up

Bernd Paysan R.I.P Natenom 🕯️

in reply to D. Olifant Still Won't Shut Up 8 months ago

That's why Deepseek's approach to use specialized engines that focus on a particular purpose is far better: If you ask Deepseek to solve that, you get the one that was fed with textbooks full of math questions.

in reply to D. Olifant Still Won't Shut Up

xinit ☕ / 🗑‍🔥

in reply to D. Olifant Still Won't Shut Up 8 months ago

It just gets worse every step, and then blames python's floating point implementation. apparently it needs to go back and ingest some math.

Unknown parent

Alys

Unknown parent 8 months ago

"maybe it's ok to polish a text that isn't too important" - My feeling is that if the text isn't too important, it doesn't need much polishing, and a human should do any polishing necessary anyway. Then later when the human has to polish text that is absolutely critical to get right, the human has had practice at polishing and does it well.

@airshipper @kevinriggle @oli

@Kevin Riggle @64 Islands Aroha Cooperative @D. Olifant Still Won't Shut Up

in reply to Alys

Jesus Margar

in reply to Alys 8 months ago

I guess you don't need to write in a foreign language for work. For those of us without that privilege LLMs can help us level up!

in reply to D. Olifant Still Won't Shut Up

Richard W. Woodley ELBOWS UP 🇨🇦🌹🚴‍♂️📷 🗺️

in reply to D. Olifant Still Won't Shut Up 8 months ago

because everyone knows 11 is bigger than 9

in reply to Charlie Stross

Mastokarl 🇺🇦

in reply to Charlie Stross 8 months ago

Thought about this some more. No, I do not think that using the Chinese room thought experiment you will ever accept a computer program‘s behavior as intelligent, no matter how much evidence for its intelligent behavior you get. Because per definition there‘s an algorithm executing it.

I don’t agree, because I don‘t buy into the human exceptionalism that we meat machines have some magic inside of us that gives us intent the machines can‘t have.

Unknown parent

Claudius Link

Unknown parent 8 months ago

I'm confused
I wrote that I "FAIL to be astonished"

You wrote about "astonishingly "intelligent" answers"

I just refuse to call a system AI or even just intelligent if it just a reproduction of patterns

Unknown parent

Anko Brand Ambassador 🎇

Unknown parent 8 months ago

people have been mistaking "statistics" and "algorithms" and "procedural generation" and "fuzzy logic" for intelligence for a long while, I guess!

For me a working definition is; humans are intelligent, humans can learn from each other. Animals even, have a level of intelligence, they learn from each other.

Large language models? They don't learn from each other. If you use the output of one model to train another, the new model gets *worse* not smarter, you get model collapse. Not intelligence, still just statistics.

Unknown parent

Mastokarl 🇺🇦

Unknown parent 8 months ago

that these intelligence tests have not been part of the training sets of the LLMs. The machines do well on tests that I find intellectually challenging. B) okay, personal anecdotal experience is always a great proof, but still: I have played with Eliza back then. Got boring after a short time. OTOH, I recently had an LLM (Google Gemini 2.5 pro, in case you‘re interested) develop a puzzle based on but very different to Tetris. To my best google skills nobody else has written this…

Unknown parent

Mastokarl 🇺🇦

Unknown parent 8 months ago

count the n-character-tuples (e.g. all 3-character sequences), and then produce text not entirely unlike English by choosing the next character that completes the most probable tuple. This is what I think about when talking about stochastical parrots. My program clearly had no clue of what it generates, it will complete „Quee“ to „Queen“ or „Queer“ without having any clue what these words mean…

Unknown parent

Mastokarl 🇺🇦

Unknown parent 8 months ago

Of the maybe 1500 lines of code, less then 10 were mine. Understanding a spec that it never has come across and turning it into good, working code is something I fail to attribute to anything but intelligence.

Knowledge representation: Okay, another personal story, sorry. Long ago when PC didn‘t mean „x86 architecture“, I read about statistical text generation and wrote a program that would take a longer text, …

in reply to Mastokarl 🇺🇦

Mastokarl 🇺🇦

in reply to Mastokarl 🇺🇦 8 months ago

flavor of Tetris puzzle before. I gave the LLM a 3 page specification of the game, asked it to ask questions if something is not clear. And after some very, well, intelligent questions, the questions a skilled developer would ask, to clarify bits that I had not specified well, it generated the game for me, and after a few refinements the game was working beautifully, in the technology I wanted with the scope I wanted...

Unknown parent

Mastokarl 🇺🇦

Unknown parent 8 months ago

In my book this is far beyond what I would call stochastic parroting (although, all the weights in the NN are in the end used in a stochastic process. Would you not agree that a system that clearly has a sophisticated semantic representation of a huge number of concepts is, in representing knowledge, intelligent? …

in reply to Mastokarl 🇺🇦

Mastokarl 🇺🇦

in reply to Mastokarl 🇺🇦 8 months ago

LLMs use a deep neural net to learn the meaning of tokens, words, utterances. Meaning = they can represent concepts and their relationship to other concepts. Playing around with embedding can show this nicely. The vector for „Queen“ and „Mercury“ will be closer than the vector for „Queen“ and „Hydrogen“ (not really tried this 😀 ). So an LLM has a sophisticated representation of complex concepts that it is using to generate text. …

in reply to Mastokarl 🇺🇦

Mastokarl 🇺🇦

in reply to Mastokarl 🇺🇦 8 months ago

Production of content. Yes, very clearly, there‘s not much intelligent about the algorithm that completes a token sequence by selecting (based on temperature) a plausible next token to continue the token sequence. But you should not ignore that all the learned concepts are part of the (stochastic) process to do this. I‘m getting very speculative now, but isn’t this also how we work? Say a sentence and try to skip 4 words. Remember how you need to mentally fast forward a song…

in reply to Mastokarl 🇺🇦

Claudius Link

in reply to Mastokarl 🇺🇦 8 months ago

And this is where I disagree.
The current AI system have no clue about semantics; they just have such a large context of syntax that it seems like semantics.

To illustrate it imagine a magician.
A hobbyist magician might make a handkerchief disappear. David Copperfield making the Statue of Liberty disappear, or Franz Harary vanishing Tower Bridge is a whole different level. But it's no magic. nevertheless.

And regarding you tetris example.
Asking and LLM to write a novel let's say in style of Ernest Hemingway wile give a result. Searching will reveal that this novel was never written before.
Thats neither creative, intelligence or impressive. Actually, if a human would do it would be plagiarism (and if it was sold off as previously unknown work of Ernest Hemingway it would be forgery).

So, the question is not, if the LLM can write a game which hasn't been written before in this exact version.

The question is, could the "AI" have developed Pong before it was created, Tetris before Tetris was a thing, Wolfenstein 3d before it was envisioned, or Po