Skip to main content


Nvidia loses $500 bn in value as Chinese AI firm jolts tech shares


in reply to xiao

This is almost too perfect but they'll learn nothing. Rather than make concessions to get rid of the ridiculous capitalistic & oligarchic system & promote rapid innovation through community & open source, they'll just demand that government give them even more trillions in tax payer money so they can compete with an open source model.
This entry was edited (3 weeks ago)
in reply to John Richard

they can dig a deeper hole but this does not help with deepseek being 30x cheaper.
in reply to xiao

The number of people repeating "I bet it won't tell you about Tianamen Square" jokes around this news has - imho - neatly explained why the US tech sector is absolutely fucked going into the next generation.
in reply to UnderpantsWeevil

???
you dont use training data when running models, that's what is used in training them.
in reply to Womble

DeepSeek open-sourced their model. Go ahead and train it on different data and try again.
in reply to UnderpantsWeevil

Wow ok, you really dont know what you're talking about huh?

No I dont have thousands of almost top of the line graphics cards to retain an LLM from scratch, nor the millions of dollars to pay for electricity.

I'm sure someone will and I'm glad this has been open sourced, its a great boon. But that's still no excuse to sweep under the rug blatant censorship of topics the CCP dont want to be talked about.

in reply to Womble

No I dont have thousands of almost top of the line graphics cards to retain an LLM from scratch


Fortunately, you don't need thousands of top of the line cards to train the DeepSeek model. That's the innovation people are excited about. The model improves on the original LLM design to reduce time to train and time to retrieve information.

Contrary to common belief, an LLM isn't just a fancy Wikipedia. Its a schema for building out a graph of individual pieces of data, attached to a translation tool that turns human-language inputs into graph-search parameters. If you put facts about Tianamen Square in 1989 into the model, you'll get them back as results through the front-end.

You don't need to be scared of technology just because the team that introduced the original training data didn't configure this piece of open-source software the way you like it.

that’s still no excuse to sweep under the rug blatant censorship of topics the CCP dont want to be talked about.
Wow ok, you really dont know what you’re talking about huh?


in reply to UnderpantsWeevil

analyticsvidhya.com/blog/2024/…

Huh I guess 6 million USD is not millions eh? The innovation is it's comparatively cheap to train, compared to the billions OpenAI et al are spending (and that is with acquiring thousands of H800s not included in the cost).

Edit: just realised that was for the wrong model! but r1 was trained in the same budget
https://x.com/GavinSBaker/status/1883891311473782995?mx=2

This entry was edited (3 weeks ago)
in reply to Womble

The innovation is it’s comparatively cheap to train, compared to the billions


Smaller builds with less comprehensive datasets take less time and money. Again, this doesn't have to be encyclopedic. You can train your model entirely on a small sample of material detailing historical events in and around Beijing in 1989 if you are exclusively fixated on getting results back about Tienanmen Square.

This entry was edited (3 weeks ago)
in reply to UnderpantsWeevil

Ok sure, as I said before I am grateful that they have done this and open sourced it. But it is still deliberately politically censored, and no "Just train your own bro" is not a reasonable reply to that.
in reply to Womble

They know less than I do about LLMs of that’s something they think you can just DO… and that’s saying a lot.
in reply to UnderpantsWeevil

Oh, by the way, as to your theory of "maybe it just doesnt know about Tiananmen, its not an encyclopedia"...
This entry was edited (3 weeks ago)
in reply to Womble

I don't think I've seen that internal dialog before with LLMs. Do you get that with most models when running using ollama?
in reply to Dhs92

No it's not a feature of ollama, thats the innovation of the "chain of thought" models like OpenAI's o1 and now this deepseek model, it narrates an internal dialogue first in order to try and create more consistent answers. It isnt perfect but it helps it do things like logical reasoning at the cost of taking a lot longer to get to the answer.
This entry was edited (3 weeks ago)
in reply to MrTolkinghoen

Because the parent comment by Womble is about using the Chinese hosted DeepSeek app, not hosting the model themselves. The user above who responded either didn't read the original comment carefully enough, or provided a very snarky response. Neither is particularly endearing.
in reply to JasSmith

No, that was me running the model on my own machine not using deepseek's hosted one. What they were doing was justifying blatent politcal censorship by saying anyone could spend millions of dollars themselves to follow their method and make your own model.

You'll notice how they stopped replying to my posts and started replying to others once it became untenable to pretend it wasnt censorship.

This entry was edited (3 weeks ago)
in reply to JasSmith

But yeah. Anyone who thinks the app / stock model isn't going to be heavily censored...

Why else would it be free? It's absolutely state sponsored media. Or it's a singularity and they're just trying to get people to run it from within their networks, the former being far more plausible.

And I know, I know, the latter isn't how llms work. But would we really know when it isn't?

This entry was edited (3 weeks ago)
in reply to UnderpantsWeevil

Lol well. When I saw this I knew the model would be censored to hell, and then the ccp abliteration training data repo made a lot more sense. That being said, the open source effort to reproduce it is far more appealing.
in reply to Bronzebeard

Maybe they should have been clearer than saying people were joking about it doing something that it actually does if they wanted to make a point.
in reply to Womble

People caring more about "China bad" instead of looking at what the tech they made can actually do is the issue.

You needing this explicitly spelled out for you does not help the case.

in reply to Bronzebeard

ngl I'm still confused

what the tech they made can actually do


It's AI, it does AI things, is it because China can now do the things we do (coding/development/search queries etc) that are just as good as America that it's a problem?

in reply to ikt

It has nothing to do with it being China. They just figured out how to do it more efficiently and with lower powered chips, meaning nvidia's market dominance in high end chips that they could charge whatever they wanted for just for is legs cut out from under them. If you don't need as many to run AI, Nvidia won't sell as many.
in reply to Bronzebeard

So the idea with this comment:

The number of people repeating “I bet it won’t tell you about Tianamen Square” jokes around this news has - imho - neatly explained why the US tech sector is absolutely fucked going into the next generation.


is that people have misplaced their concern, not at the fact that it's censored but that the US has lost the technology high ground and won't get it back for at least a generation?

in reply to Bronzebeard

I'm slow, what's the point? how does people joking about the fact China is censoring output explain

why the US tech sector is absolutely fucked going into the next generation
in reply to ikt

Because they care more about the model not parroting US state dept narratives than the engineering behind it.
in reply to Womble

Try an abliterated version of the qwen 14b or 32b R1 distills. I just tried it out they will give you a real overview.

Still even when abliterated its just not very knowledgeable about "harmful information". If you want a truly uncensored model hit up mistral small 22b and its even more uncensored fine tune Beepo 22b

This entry was edited (3 weeks ago)
in reply to SmokeyDope

Oh I hadnt realised uncensored version had started coming out yet, I definitely wil look into it once quantised versions drop.
in reply to Womble

Mradermacher has you covered with quants: huggingface.co/mradermacher/De…
This entry was edited (3 weeks ago)
in reply to Womble

That's just dumb. It at least doesn't suppress that when provided with search results/refuses to search (at least when integrated in Kagi)
in reply to Womble

It's even worse / funnier in the app, it will generate the response, then once it realizes its about Taiwan it will delete the whole response and say sorry I can't do that.

If you ask it "what is the republic of china" it will generate a couple paragraphs of the history of China, then it'll get a couple sentences in about the retreat to Taiwan and then stop and delete the response.

in reply to Not_mikey

In fairness that is also exactly what chatgpt, claude and the rest do for their online versions too when you hit their limits (usually around sex). IIRC they work by having a second LLM monitor the output and send a cancel signal if they think its gone over the line.
This entry was edited (3 weeks ago)
in reply to Womble

Okay but one is about puritanical Western cultural standards about sex, and one is about government censorship to maintain totalitarian power. One of these things is not like the other.
in reply to JasSmith

Yes I'm aware, I was saying that the method is the same.
in reply to JasSmith

Sorry that a chinese made model doesn't parrot US state dept narratives 😞
in reply to UnderpantsWeevil

deepseek bad because it doesn't parrot my US State Dept narrative 😞
in reply to xiao

Reposting from a similar post, but... I went over to huggingface and took a look at this.

Deepseek is huge. Like Llama 3.3 huge. I haven't done any benchmarking, which I'm guessing is out there, but it surely would take as much Nvidia muscle to run this at scale as ChatGPT, even if it was much, much cheaper to train, right?

So is the rout based on the idea that the need for training hardware is much smaller than suspected even if the operation cost is the same... or is the stock market just clueless and dumb and they're all running on vibes at all times anyway?

in reply to MudMan

Clueless dumb vibes, yeah. But exaggerated by the media for clicks, too - Nvidia price is currently the same as it was in Sept 2024. Not really a huge deal.

Anyway, the more efficient it is the more usage there will be and in the longer run more GPUs will be needed - greenchoices.org/news/blog-pos…

in reply to Rimu

Sure, 15% isn't the worst adjustment we've seen in a tech company by a long shot, even if the absolute magnitude of that loss is absolutely ridiculous because Nvidia is worth all the money, apperently.

But everybody is acting like this is a seismic shock, which is fascinatingly bizarre to me. It seems the normie-investor axis really believed that forcing Nvidia to sell China marginally slower hardware was going to cripple their ability to make chatbots permanently, which I feel everybody had called out as being extremely not the case even before these guys came up with a workaround for some of the technical limitations.

in reply to MudMan

I think it has to do with how much cheaper the Chinese company is offering tokens for. It is severely undercutting the American companies. Going forward they won’t have unlimited cash as they are used to.
in reply to radiohead37

But the cost per token would target Microsoft, Meta and Google way more than Nvidia. They still control the infrastructure, the software guys are the ones being uncercut.

Not that I expect the token revenue was generating "unlimited money" anyway, but still.

in reply to MudMan

It comes in different versions, some of which are enormous and some that are pretty modest. The thing is they are not competing with 4o, but with o1, which has really massive resource requirements. The big news this week was supposed to be o3 opening up to the public, which involved another huge resource jump, and set a clear trajectory for the future of AI being on an exponential curve of computing power. This was great news for companies that made the parts and who could afford the massive buildout to support future developments. Deepseek isnt so much disruptive for its own capabilities, its disruptive because challenges the necessity of this massive buildout.
in reply to jrs100000

I suppose the real change is on the assumption that we were going to have to go all paperclip maximizer on Nvidia GPUs forever and on that front yeah, Nividia would have become marginally less the owner of a planet made from GPUs all the way through.

They're still the owner of a planet made of rock where people run AIs on GPUs, though. Which I guess is worth like 15% less or whatever.

in reply to MudMan

in reply to xiao

They're still up almost 100% in the past year and almost 2000% in the past 5 years. The stock price will be fine.
in reply to DannyBoy

Check the P/E on a lot of these firms. So much of the valuation is predicated on enormous rates of future growth. Their revenue isn't keeping up with their valuation. A big chunk of that 2000% is just people trading on the Greater Fool who will buy the shares later.

Microsoft will be fine, sure. Meta will be fine, sure. The guy leveraged to the tits to go long on ARK Innovation ETF? Far more dubious.

This entry was edited (3 weeks ago)
in reply to xiao

What was made more efficiently? The chip? The energy needs of the AI model?
in reply to reallykindasorta

Software. I know surface level about the current AI environment and I have friends saying buy to Nvidia but I was wondering when there would be improvements to the software.

Example, we used to need a pretty top notch PC to play Doom but now we can emulate the hardware and run it on a pregnancy test.

in reply to Pistcow

This assumes some kind of eureka innovation, right? A 96% reduction in compute demands per "token" is revolutionary. I haven't seen anyone yet explain what that innovation is, exactly. There is also mixed reporting on how "open source" DeekSeek is, with many claiming it's only "open weight," meaning people are having difficulty reproducing the creation of the model. It wouldn't be the first time that a claim out of China were false, and I think it wise to reproduce any such claims before running around with our arms in the air.
in reply to JasSmith

Agreed. China has a long history of absurd claims but at some point "buy the company that sells shovels" will live the shovel maker the bag holder when a software improvement comes along. Just a mater of when.
in reply to xiao

The censorship goes way beyond obvious stuff like taiwan.

Ask DeepSeek "What is a tankie?" and see what happens.

This entry was edited (3 weeks ago)
in reply to Rimu

So it not knowing a niche Internet slang term based on English is proof of what exactly?

It's open source. I'm sure there's already a fork patching in the big omissions.

in reply to Bronzebeard

Word definitions are exactly the sorts of things one would expect an LLM to pick up on.

ChatGPT:

A "tankie" is a term often used to describe a person who is an ardent supporter of authoritarian regimes, particularly those that claim to be socialist or communist. The term originally referred to members of communist parties who supported the Soviet Union's use of tanks to suppress uprisings, like the 1956 Hungarian Revolution and the 1968 Prague Spring. Over time, it's been used more broadly to refer to people who justify or defend the actions of regimes like the Soviet Union, China under Mao, or North Korea, often in the face of human rights abuses or authoritarian policies.

The label can sometimes be used pejoratively to imply that someone is uncritical of authoritarianism or blindly supportive of these regimes due to ideological alignment, even when those regimes engage in actions that contradict the values they claim to uphold.

This entry was edited (3 weeks ago)
in reply to Bronzebeard

Global News reshared this.

in reply to Bronzebeard

It's definitely censorship, you can see it on there app as it's still buggy and will generate a response then halfway through it will delete it and say "sorry that's beyond my current scope"

It did actually give a good multi paragraph response to "what is a tankie" before it realized that it was a no-no topic.

in reply to Rimu

This entry was edited (3 weeks ago)
in reply to wiki_me

in reply to xiao

It only cost $5 million to blow out $500 billion from the stock market.

All hail open source.

in reply to xiao

It’s the yen carry trade unwinding. Nvidia is the over leveraged hedge fund’s liquidity piggy bank.
in reply to xiao

fell more than 15 percent in midday deals on Wall Street, erasing more than $500 billion of its market value


If I'm reading this correctly, that would mean that NV was previously valued at ~$3.4T??

Yeah, they might be a bit overvalued. Just hint.

in reply to IHeartBadCode

Come on, one company being worth 3% of the whole market is completely normal...
in reply to xiao

I've been watching the capex boys react to DeepSeek and laughing hysterically tbh
in reply to xiao

There is no problem with deflating the bubble. There still is a lot of hot air inside to lose.
in reply to xiao

I'm not interested in reading propaganda. Give me something that benefits me as a reader.

What technical changes has made these Chinese models more cost-effective? Less reliance on parallelism? Less reliance on memory? Custom hardware? Availability of training data?

There are no details in the article. It doesn't even benefit investors.