- My Forums
- Tiger Rant
- LSU Recruiting
- SEC Rant
- Saints Talk
- Pelicans Talk
- More Sports Board
- Coaching Changes
- Fantasy Sports
- Golf Board
- Soccer Board
- O-T Lounge
- Tech Board
- Home/Garden Board
- Outdoor Board
- Health/Fitness Board
- Movie/TV Board
- Book Board
- Music Board
- Political Talk
- Money Talk
- Fark Board
- Gaming Board
- Travel Board
- Food/Drink Board
- Ticket Exchange
- TD Help Board
Customize My Forums- View All Forums
- Show Left Links
- Topic Sort Options
- Trending Topics
- Recent Topics
- Active Topics
Started By
Message
DeepSeek article exposes the cost..etc
Posted on 1/27/25 at 3:27 pm
Posted on 1/27/25 at 3:27 pm
LINK
Now we get to the truth:
The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
once more
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
We do not know anything other than the final training run cost them 5.5 million.
A training run does not, I repeat, DOES NOT encompass all the costs.
quote:
The most proximate announcement to this weekend’s meltdown was R1, a reasoning model that is similar to OpenAI’s o1. However, many of the revelations that contributed to the meltdown — including DeepSeek’s training costs — actually accompanied the V3 announcement over Christmas. Moreover, many of the breakthroughs that undergirded V3 were actually revealed with the release of the V2 model last January.
Now we get to the truth:
The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
once more
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
We do not know anything other than the final training run cost them 5.5 million.
A training run does not, I repeat, DOES NOT encompass all the costs.
Posted on 1/27/25 at 3:33 pm to BCreed1
Seems like a critical point.
Plus, there's this:
CBS article from today
DeepSeek is a private Chinese company founded in July 2023 by Liang Wenfeng, a graduate of Zhejiang University, one of China's top universities, who funded the startup via his hedge fund, according to the MIT Technology Review. Liang has about $8 billion in assets, Ives wrote in a Jan. 27 research note.
Liang, who had previously focused on applying AI to investing, had bought a "stockpile of Nvidia A100 chips," a type of tech that is now banned from export to China. Those chips became the basis of DeepSeek, the MIT publication reported.
Plus, there's this:
CBS article from today
DeepSeek is a private Chinese company founded in July 2023 by Liang Wenfeng, a graduate of Zhejiang University, one of China's top universities, who funded the startup via his hedge fund, according to the MIT Technology Review. Liang has about $8 billion in assets, Ives wrote in a Jan. 27 research note.
Liang, who had previously focused on applying AI to investing, had bought a "stockpile of Nvidia A100 chips," a type of tech that is now banned from export to China. Those chips became the basis of DeepSeek, the MIT publication reported.
Posted on 1/27/25 at 3:37 pm to Naked Bootleg
Yes.
Which is why we will never get the details. This was a massive knee jerk reaction to DEEPSEEK
Which is why we will never get the details. This was a massive knee jerk reaction to DEEPSEEK
Posted on 1/27/25 at 3:38 pm to Naked Bootleg
Those who had the courage to buy today will be rewarded. Probably sooner than later.
Posted on 1/27/25 at 3:54 pm to bayoubengals88
100% believe that.
Each chip that they claim to have used costs tens of thousands.....each.
Each chip that they claim to have used costs tens of thousands.....each.
Posted on 1/27/25 at 5:31 pm to BCreed1
1) it does perform as well/better (this is verified, in the US)
2) it is cheaper and more flexible
3) it is not clear what the two things above will mean for hardware and entergy sectors
4) it is clear that there will be beneficiaries no matter what
2) it is cheaper and more flexible
3) it is not clear what the two things above will mean for hardware and entergy sectors
4) it is clear that there will be beneficiaries no matter what
Posted on 1/27/25 at 6:07 pm to bayoubengals88
quote:
Those who had the courage to buy today will be rewarded. Probably sooner than later.
I hope you are right. I added a shares today. If it falls a similar percentage in the coming days I’ll add even more. I have not added to my NVDA in a long time. This is too good to pass up IMO.
This post was edited on 1/27/25 at 6:09 pm
Posted on 1/27/25 at 6:58 pm to go ta hell ole miss
The 100 shares of HOOD that got called away may be a blessing in disguise.
Great timing.
Great timing.
Posted on 1/27/25 at 7:21 pm to BCreed1
What stage of grief do people start sky-screaming that the shorts haven’t covered?
Posted on 1/28/25 at 9:14 am to cgrand
quote:
1) it does perform as well/better (this is verified, in the US)
2) it is cheaper and more flexible
3) it is not clear what the two things above will mean for hardware and entergy sectors
Addressing #3, it appears the stock market assumes that since this model appears to be faster that there's less need for more hardware. That's not how researchers work.
Rather than thinking "My model will run in half the time" the thinking is "I can run a larger model that was previously impossible now"
So, the companies with the latest whizbang hardware pick up the algorithmic advances made by DeepSeek and run larger models with more accuracy and resolution.
Posted on 1/28/25 at 9:26 am to hob
quote:
So, the companies with the latest whizbang hardware pick up the algorithmic advances made by DeepSeek and run larger models with more accuracy and resolution.
Exactly. Which is a tailwind for NVDA. I'm not shocked people don't understand this.
Posted on 1/28/25 at 9:46 am to Naked Bootleg
quote:
Liang, who had previously focused on applying AI to investing, had bought a "stockpile of Nvidia A100 chips,"
:wellthereitis:
Posted on 1/28/25 at 9:50 am to cgrand
1) it does perform as well/better (this is verified, in the US)
Always a possibility (though I've heard wildly mixed results)
2) it is cheaper and more flexible
Nope
3) it is not clear what the two things above will mean for hardware and entergy sectors
More Ai, More usage, More energy expended. More need for hardware. Clear enough.
4) it is clear that there will be beneficiaries no matter what
There will always be beneficiaries in everything.
Always a possibility (though I've heard wildly mixed results)
2) it is cheaper and more flexible
Nope
3) it is not clear what the two things above will mean for hardware and entergy sectors
More Ai, More usage, More energy expended. More need for hardware. Clear enough.
4) it is clear that there will be beneficiaries no matter what
There will always be beneficiaries in everything.
This post was edited on 1/28/25 at 9:52 am
Posted on 1/28/25 at 11:45 am to skewbs
quote:
I'm not shocked people don't understand this
This event is exposing lots of emotional thinkers and NPCs
Posted on 1/28/25 at 12:06 pm to hob
quote:what that means for hardware stocks I have no way of knowing. But what I do know is that the proliferation of AI models that are faster and easier and cheaper to build and teach means…more apps. And more apps means companies that support apps will have potential to grow revenues. And this puts us right back where we were in 20/21…software
Rather than thinking "My model will run in half the time" the thinking is "I can run a larger model that was previously impossible now"
Posted on 2/12/25 at 7:14 am to cgrand
quote:
more apps means companies that support apps will have potential to grow revenues. And this puts us right back where we were in 20/21…software

Posted on 2/12/25 at 7:30 am to BCreed1
China lied and the market is full of morons. Just another day in life.
Posted on 2/12/25 at 8:50 am to SlowFlowPro
This event is exposing lots of emotional thinkers and NPCs
___________
Agree. I added a small number of shares, and never considered selling.
___________
Agree. I added a small number of shares, and never considered selling.
Posted on 2/12/25 at 10:24 am to KWL85
Posted on 2/12/25 at 10:54 am to cgrand
watch the software names that ENABLE or SUPPORT or PROTECT or IMPROVE AI models. CFLT is a good example (up 20% upon good earnings and a partnership with databricks)
others include CRWD, NET, ZS, NOW, SNOW, ESTC, PLTR etc
others include CRWD, NET, ZS, NOW, SNOW, ESTC, PLTR etc
Popular
Back to top


4








