Could Nvidia's (NASDAQ: NVDA) magical two-year run be coming to an end? Up until now, there has been insatiable demand for Nvidia's latest and greatest graphics processing units (GPUs). As the artificial intelligence races heated up, big tech companies and start-ups alike rushed to buy or rent as many of Nvidia's high-performance GPUs as they could in a bid to create better and better models.
But last week, Chinese AI start-up DeepSeek released its R1 model that stunned the technology world. R1 is a "reasoning" model that has matched or exceeded OpenAI's o1 reasoning model, which was just released at the beginning of December, for a fraction of the cost.
Start Your Mornings Smarter! Wake up with Breakfast news in your inbox every market day. Sign Up For Free »
Being able to generate leading-edge large language models (LLMs) with limited computing resources could mean that AI companies might not need to buy or rent as much high-cost compute resources in the future. The consequences could be devastating for Nvidia and last year's AI winners alike.
But as always, the truth is more complicated.
DeepSeek is an AI lab spun out of a quantitative hedge fund called High-Flyer. CEO Liang Wenfeng founded High-Flyer in 2015 and began the DeepSeek venture in 2023 after the earth-shaking debut of ChatGPT.
DeepSeek has been building AI models ever since, reportedly purchasing 10,000 Nvidia A100s before they were restricted, which are two generations prior to the current Blackwell chip. DeepSeek also reportedly has a cluster of Nvidia H800s, which is a capped, or slowed, version of the Nvidia H100 designed for the Chinese market. Of note, the H100 is the latest generation of Nvidia GPUs prior to the recent launch of Blackwell.
On Jan. 20, DeepSeek released R1, its first "reasoning" model based on its V3 LLM. Reasoning models are relatively new, and use a technique called reinforcement learning, which essentially pushes an LLM to go down a chain of thought, then reverse if it runs into a "wall," before exploring various alternative approaches before getting to a final answer. Reasoning models can therefore answer complex questions with more precision than straight question-and-answer models can't.
Incredibly, R1 has been able to meet or even exceed OpenAI's o1 on several benchmarks, while reportedly trained at a small fraction of the cost.
Just how cheap are we talking about? The R1 paper claims the model was trained on the equivalent of just $5.6 million rented GPU hours, which is a small fraction of the hundreds of millions reportedly spent by OpenAI and other U.S.-based leaders. DeepSeek is also charging about one-thirtieth of the price it costs OpenAI's o1 to run, while Wenfeng maintains DeepSeek charges for a "small profit" above costs. Experts have estimated that Meta Platforms' (NASDAQ: META) Llama 3.1 405B model cost about $60 million of rented GPU hours to run, compared with the $6 million or so for V3, even as V3 outperformed Llama's latest model on a variety of benchmarks.
According to an informative blog post by Kevin Xu, DeepSeek was able to pull this minor miracle off with three unique advantages.
First, Wenfang built DeepSeek as sort of an idealistic AI research lab without a clear business model. Currently, DeepSeek charges a small fee for others seeing to build products on top of it, but otherwise makes its open-source model available for free. Wenfang also recruited largely young people who have just graduated from school or who were in Ph.D. programs at China's top universities. This led to a culture of free experimentation and trial-and-error without big expectations, and set DeepSeek apart from China's tech giants.
Second, DeepSeek uses its own data center, which allowed it to optimize the hardware racks for its own purposes.
Finally, DeepSeek was then able to optimize its learning algorithms in a number of ways that, taken together, allowed DeepSeek to maximize the performance of its hardware.
For instance, DeepSeek built its own parallel processing algorithm from the ground up called the HAI-LLM framework, which optimized computing workloads across its limited number of chips. DeepSeek also uses F8, or 8-bit, data input framework, a less-precise framework than F32. While F8 is "less precise," it also saves a ton in memory utilization, and R1's other processes were also able to then make up for the lack of precision with a greater number of efficient calculations. DeepSeek also optimized its load-balancing networking kernel, maximizing the work done by each H800 cluster, so that no hardware was ever left "waiting" for data.
These are just a few of the innovations that allowed DeepSeek to do more with less. But when cobbling all of these "hacks" together, it led to a remarkable increase in performance.
The negative implication for Nvidia is that by innovating at the software level as DeepSeek has done, AI companies may become less dependent on hardware, which could affect Nvidia's sales growth and margins.
As dire as R1 may seem for Nvidia, there are several counterpoints to the thesis that Nvidia is "doomed."
First, some are skeptical that the Chinese startup is being totally forthright in its cost estimates. According to machine learning researcher Nathan Lampbert, the $5.6 million figure of rented GPU hours probably doesn't account for a number of extra costs. These extra costs include significant pre-training hours prior to training the large model, the capital expenditures to buy GPUs and construct data centers (if DeepSeek truly built its own data center and didn't rent from a cloud), and high energy costs. There is also the matter of DeepSeek's engineering salaries, as R1 had 139 technical authors. Since DeepSeek is open-source, not all of these authors are likely to work at the company, but many probably do, and make a sufficient salary.
Lampert estimates DeepSeek's annual costs for operations are probably closer to between $500 million and $1 billion. That's still far below the costs at its U.S. rivals, but obviously much more than the $6 million put forth by the R1 paper.
There are also some who simply doubt DeepSeek is being forthright in its access to chips. In a recent interview, Scale AI CEO Alexandr Wang told CNBC he believes DeepSeek has access to a 50,000 H100 cluster that it isn't disclosing, because those chips are illegal in China following 2022 export restrictions.
However, given that DeepSeek has openly published its techniques for the R1 model, researchers should be able to emulate its success with limited resources. As of now, it appears the R1 efficiency breakthrough is more real than not.
While DeepSeek is no doubt impressive, ex-OpenAI executive Miles Brundage also cautioned against reading too much into R1's debut. Brundage notes that OpenAI is already out with its o3 model and soon its o5 model. While DeepSeek has been able to hack its way to R1 with novel techniques, its limited computing power is likely to slow down the pace at which it can scale up and advance from its first reasoning model.
Brundage also notes that limited computing resources will affect how these models can perform simultaneously in the real world:
Even if that's the smallest possible version while maintaining its intelligence -- the already-distilled version -- you'll still want to use it in multiple real-world applications simultaneously. You wouldn't want to choose between using it for improving cyber capabilities, helping with homework, or solving cancer. You'd want to do all of these things. This requires running many copies in parallel, generating hundreds or thousands of attempts at solving difficult problems before selecting the best solution. ... To make a human-AI analogy, consider Einstein or John von Neumann as the smartest possible person you could fit in a human brain. You would still want more of them. You'd want more copies. That's basically what inference compute or test-time compute is -- copying the smart thing. It's better to have an hour of Einstein's time than a minute, and I don't see why that wouldn't be true for AI.
Finally, investors should keep in mind the Jevons paradox. Coined by English economist William Stanley Jevons in 1865 regarding coal usage, this is the phenomenon that occurs when a technological process is made more efficient. According to Jevon's paradox, if a resource is used more efficiently, rather than seeing a decrease in the use of that resource, consumption increases exponentially. The increased demand then usually more than fully offsets the efficiency gained, leading to an overall increase in demand for that resource.
For AI, if the cost of training advanced models falls, look for AI to be used more and more in our daily lives. That should, according to the paradox, actually increase demand for computing power -- although probably more for inference rather than training. So that could actually benefit Nvidia, strangely. On the other hand, it is thought that AI inferencing may be more competitive relative to training for Nvidia, so that may be a negative. But that negative would arise from more competition, not decreased computing demand.
The bottom line is that demand for AI computing should continue to grow a lot for years to come. After all, on Jan. 24, Meta Platforms CEO Mark Zuckerberg announced that Meta would be building an AI data center almost as big as Manhattan and will ramp up its capital spending to a range of $60 billion to $65 billion this year, up from a range of $38 billion to $40 billion in 2024.
This announcement came four days after DeepSeek's release, so there was no way Zuckerberg wasn't aware of it. Yet he still thinks a huge 50%-plus increase in AI infrastructure spending is warranted.
No doubt, the advent of DeepSeek will have an effect on the AI races. But rather than being "game over" for Nvidia and other "Magnificent Seven" companies, the reality will be more nuanced.
As the AI races progress, investors will have to assess which companies have a true AI "moat," as AI business models evolve at rapid speed and in surprising ways, as DeepSeek R1 just showed.
Before you buy stock in Nvidia, consider this:
The Motley Fool Stock Advisor analyst team just identified what they believe are the 10 best stocks for investors to buy now… and Nvidia wasn’t one of them. The 10 stocks that made the cut could produce monster returns in the coming years.
Consider when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you’d have $874,051!*
Stock Advisor provides investors with an easy-to-follow blueprint for success, including guidance on building a portfolio, regular updates from analysts, and two new stock picks each month. The Stock Advisor service has more than quadrupled the return of S&P 500 since 2002*.
Learn more »
*Stock Advisor returns as of January 27, 2025
Randi Zuckerberg, a former director of market development and spokeswoman for Facebook and sister to Meta Platforms CEO Mark Zuckerberg, is a member of The Motley Fool's board of directors. Billy Duberstein and/or his clients have positions in Meta Platforms. The Motley Fool has positions in and recommends Meta Platforms and Nvidia. The Motley Fool has a disclosure policy.