DeepSeek’s new architecture and why it matters

Nearly a year after DeepSeek’s low-cost, high performanceR1 modelrocked both Silicon Valley and Wall Street, the Chinese AI lab is poised to shake up the AI industry once more. This time, DeepSeek hasreleased a new frameworkthat could make the training of large language models (LLMs) much more efficient, stable and scalable. Perhaps most importantly, this brings down the cost of pretraining, which unlocks the power of LLMs to smaller companies and individual developers.“With this innovation, DeepSeek is saying ‘how do I get more bang for my buck during pretraining?’” said IBM Distinguished Engineer Chris Hay in an interview withIBM Think. “Model training is the expensive part.”DeepSeek researchers tested this new architecture, called Manifold-Constrained Hyper-Connections (mHC), on models with three billion, nine billion and 27 billion parameters. They found the models scaled without adding significant computational burden or instabilities, both of which usually increase in tandem with scaling.

Typically, frontier AI labs rely on “brute force” to improve AI, said Kaoutar El Maghraoui, a Principal Research Scientist at IBM, on thelatest episodeof theMixture of Expertspodcast. That means “adding more data, more compute power, more parameters,” she said. But that approach is “increasingly inefficient and only affordable by a few large companies.”El Maghraoui stressed that DeepSeek’s mHC architecture could revolutionizemodel pretraining. “It’s scaling AI more intelligently rather than just making it bigger,” she said. “It’s a smarter way of designing these models that would also work better for the hardware.” mHC can also integrate easily with a company’s custom hardware, said El Maghraoui, making it a potentially appealing option for enterprises looking for cost-efficient AI. As an example, she pointed to IBM’s specializedhardware accelerators, designed to speed up AI, machine learning and deep learning workloads for enterprise clients on premises.

In aLinkedIn post, Pierre-Carl Langlais, Cofounder of French AI startup Pleias, suggested that the paper’s true significance goes beyond proving the scalability of mHC. The “actual flex” is DeepSeek’s ability to re-engineer every dimension of the training environment, he wrote. “That’s what makes [DeepSeek] a frontier lab.”For Hay, the fact that DeepSeek keeps open sourcing its new work is notable because it makes AI more accessible to a broader audience. “I appreciate that they come up with innovations, open them up to the world, let people try [them] out, and then they bring the whole field along with them,” he said.As AI leaders in smaller organizations navigate the complexities of implementing cost-efficient AI solutions, innovations like DeepSeek’s mHC framework make it easier for them to access powerful foundation models that were historically only available to companies with much bigger wallets. By significantly reducing the cost of pretraining LLMs and making AI more accessible, DeepSeek’s breakthroughs are poised to revolutionize the AI landscape for smaller and midsize companies.

Comments (0)