Started By
Message

DeepSeek article exposes the cost..etc

Posted on 1/27/25 at 3:27 pm
Posted by BCreed1
Alabama
Member since Jan 2024
6445 posts
Posted on 1/27/25 at 3:27 pm
LINK


quote:

The most proximate announcement to this weekend’s meltdown was R1, a reasoning model that is similar to OpenAI’s o1. However, many of the revelations that contributed to the meltdown — including DeepSeek’s training costs — actually accompanied the V3 announcement over Christmas. Moreover, many of the breakthroughs that undergirded V3 were actually revealed with the release of the V2 model last January.



Now we get to the truth:

The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.

DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:

DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:


once more


DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:


We do not know anything other than the final training run cost them 5.5 million.

A training run does not, I repeat, DOES NOT encompass all the costs.





Posted by Naked Bootleg
Premium Plus® Member
Member since Jul 2021
3290 posts
Posted on 1/27/25 at 3:33 pm to
Seems like a critical point.

Plus, there's this:

CBS article from today
DeepSeek is a private Chinese company founded in July 2023 by Liang Wenfeng, a graduate of Zhejiang University, one of China's top universities, who funded the startup via his hedge fund, according to the MIT Technology Review. Liang has about $8 billion in assets, Ives wrote in a Jan. 27 research note.

Liang, who had previously focused on applying AI to investing, had bought a "stockpile of Nvidia A100 chips," a type of tech that is now banned from export to China. Those chips became the basis of DeepSeek, the MIT publication reported.
Posted by BCreed1
Alabama
Member since Jan 2024
6445 posts
Posted on 1/27/25 at 3:37 pm to
Yes.

Which is why we will never get the details. This was a massive knee jerk reaction to DEEPSEEK
Posted by bayoubengals88
LA
Member since Sep 2007
23505 posts
Posted on 1/27/25 at 3:38 pm to
Those who had the courage to buy today will be rewarded. Probably sooner than later.
Posted by BCreed1
Alabama
Member since Jan 2024
6445 posts
Posted on 1/27/25 at 3:54 pm to
100% believe that.

Each chip that they claim to have used costs tens of thousands.....each.
Posted by cgrand
HAMMOND
Member since Oct 2009
46734 posts
Posted on 1/27/25 at 5:31 pm to
1) it does perform as well/better (this is verified, in the US)
2) it is cheaper and more flexible
3) it is not clear what the two things above will mean for hardware and entergy sectors
4) it is clear that there will be beneficiaries no matter what
Posted by go ta hell ole miss
Member since Jan 2007
14568 posts
Posted on 1/27/25 at 6:07 pm to
quote:

Those who had the courage to buy today will be rewarded. Probably sooner than later.


I hope you are right. I added a shares today. If it falls a similar percentage in the coming days I’ll add even more. I have not added to my NVDA in a long time. This is too good to pass up IMO.
This post was edited on 1/27/25 at 6:09 pm
Posted by bayoubengals88
LA
Member since Sep 2007
23505 posts
Posted on 1/27/25 at 6:58 pm to
The 100 shares of HOOD that got called away may be a blessing in disguise.
Great timing.
Posted by lsuconnman
Baton rouge
Member since Feb 2007
4481 posts
Posted on 1/27/25 at 7:21 pm to
What stage of grief do people start sky-screaming that the shorts haven’t covered?
Posted by hob
Member since Dec 2017
2344 posts
Posted on 1/28/25 at 9:14 am to
quote:

1) it does perform as well/better (this is verified, in the US)
2) it is cheaper and more flexible
3) it is not clear what the two things above will mean for hardware and entergy sectors


Addressing #3, it appears the stock market assumes that since this model appears to be faster that there's less need for more hardware. That's not how researchers work.

Rather than thinking "My model will run in half the time" the thinking is "I can run a larger model that was previously impossible now"

So, the companies with the latest whizbang hardware pick up the algorithmic advances made by DeepSeek and run larger models with more accuracy and resolution.
Posted by skewbs
Member since Apr 2008
2195 posts
Posted on 1/28/25 at 9:26 am to
quote:

So, the companies with the latest whizbang hardware pick up the algorithmic advances made by DeepSeek and run larger models with more accuracy and resolution.


Exactly. Which is a tailwind for NVDA. I'm not shocked people don't understand this.
Posted by theballguy
Member since Oct 2011
31430 posts
Posted on 1/28/25 at 9:46 am to
quote:

Liang, who had previously focused on applying AI to investing, had bought a "stockpile of Nvidia A100 chips,"


:wellthereitis:
Posted by theballguy
Member since Oct 2011
31430 posts
Posted on 1/28/25 at 9:50 am to
1) it does perform as well/better (this is verified, in the US)

Always a possibility (though I've heard wildly mixed results)

2) it is cheaper and more flexible

Nope

3) it is not clear what the two things above will mean for hardware and entergy sectors

More Ai, More usage, More energy expended. More need for hardware. Clear enough.

4) it is clear that there will be beneficiaries no matter what

There will always be beneficiaries in everything.
This post was edited on 1/28/25 at 9:52 am
Posted by SlowFlowPro
With populists, expect populism
Member since Jan 2004
466895 posts
Posted on 1/28/25 at 11:45 am to
quote:

I'm not shocked people don't understand this

This event is exposing lots of emotional thinkers and NPCs
Posted by cgrand
HAMMOND
Member since Oct 2009
46734 posts
Posted on 1/28/25 at 12:06 pm to
quote:

Rather than thinking "My model will run in half the time" the thinking is "I can run a larger model that was previously impossible now"
what that means for hardware stocks I have no way of knowing. But what I do know is that the proliferation of AI models that are faster and easier and cheaper to build and teach means…more apps. And more apps means companies that support apps will have potential to grow revenues. And this puts us right back where we were in 20/21…software
Posted by cgrand
HAMMOND
Member since Oct 2009
46734 posts
Posted on 2/12/25 at 7:14 am to
quote:

more apps means companies that support apps will have potential to grow revenues. And this puts us right back where we were in 20/21…software
Posted by DarthRebel
Tier Five is Alive
Member since Feb 2013
24998 posts
Posted on 2/12/25 at 7:30 am to
China lied and the market is full of morons. Just another day in life.
Posted by KWL85
Member since Mar 2023
3189 posts
Posted on 2/12/25 at 8:50 am to
This event is exposing lots of emotional thinkers and NPCs
___________

Agree. I added a small number of shares, and never considered selling.
Posted by cgrand
HAMMOND
Member since Oct 2009
46734 posts
Posted on 2/12/25 at 10:24 am to
Posted by cgrand
HAMMOND
Member since Oct 2009
46734 posts
Posted on 2/12/25 at 10:54 am to
watch the software names that ENABLE or SUPPORT or PROTECT or IMPROVE AI models. CFLT is a good example (up 20% upon good earnings and a partnership with databricks)

others include CRWD, NET, ZS, NOW, SNOW, ESTC, PLTR etc
first pageprev pagePage 1 of 2Next pagelast page

Back to top
logoFollow TigerDroppings for LSU Football News
Follow us on X, Facebook and Instagram to get the latest updates on LSU Football and Recruiting.

FacebookXInstagram