Sci-Tech

The underlying logic and industry impact of DeepSeek's success

2025-02-19   

It seems like overnight, DeepSeek has received a tremendous amount of traffic. It has not only sparked a new wave of AI applications globally, but also brought significant impact to the global computing capital market. Upon investigation, DeepSeek has not only achieved significant engineering innovations in areas such as training and usage costs, model training and optimization methods, but also broken many traditional narrative logics in the AI field. In short, DeepSeek is changing the game rules. The emergence of DeepSeek's latest inference model R1 has brought a great surprise, as it has grown to 100 million users in just 6 days. The vision of "using curiosity to uncover the mystery of AGI" adds a touch of mystery. So, what are the technological innovations of DeepSeek and what are the underlying reasons behind its success? What impact will all of this have on the technological competition between China and the United States in the next decade? DeepSeek's unique achievements in engineering innovation have not only sparked a new wave of AI applications worldwide, but also had a significant impact on the global computing capital market. Upon investigation, DeepSeek has achieved significant engineering innovations in terms of training and usage costs, as well as model training and optimization methods. Cost is the biggest highlight. The overall training cost of DeepSeeker R1 is more than an order of magnitude lower than OpenAI. There are many engineering optimization and innovative highlights of R1 throughout the training process, including "Multi Head Latent Attention - Multi Head Invisible Attention Mechanism", "Multi Token Prediction - Multi Token Prediction", "Selective use of 8 floating-point precision FP8 to replace FP16 or even FP31", etc. These optimizations are actually not easy to implement. Every seemingly insignificant optimization has produced astonishing results under the hierarchical stacking effect. DeepSeek has actually released two models, R1 and R1 zero. DeepSeek focused on reinforcement learning and obtained R1 zero based on the V3 basic model. However, R1 zero is prone to many problems, including "multilingual mixing," when answering certain questions. Deep Seek further conducted SFT (supervised fine-tuning optimization) on this model, resulting in R1. The reinforcement learning function of R1 can be automated and is relatively easy to scale. In this way, the future imagination space of the model approaches infinity. The reason why DeepSeek has received much attention for its disruptive changes that break traditional narrative logic is essentially because it breaks many traditional narrative logics in the AI field, such as OpenAI's logic of stacking computational power for inference models, OpenAI's oligopoly pattern logic in the AI application circle, the US's logic of blocking China's high process chips, and the open-source closed source logic of AI big models... Firstly, the AI circle recognizes that the implementation difficulty of this inference model is extremely high. Previously, the only good inference model was OpenAI's GPT o1. Anthropic couldn't do it, and it took Google a long time to release the average performing Germini 2.0. DeepSeek-R1 is at least a replacement for o1, and even has some abilities stronger than o1. Furthermore, R1 is not only free but also open-source, with exponential reductions in both training and usage costs. Previously, o1 could 'harvest' value for a considerable period of time due to its leadership, but the emergence of DeepSeeker R1 and open source initiatives have enabled the vast majority of developers and application focused startups to develop at a lower threshold. The R1, which is cheaper and easier to deploy privately, shattered Wall Street's valuation logic for all major model companies. Secondly, the emergence of DeepSeek has broken the competitive shackles of the AI application circle. Even when top American app companies choose DeepSeek and ChatGPT, the answer is clear. The superstar applications in the AI field, such as Cursor and Perplexity, have deployed the DeepSeek model in the first place and set the recommendation as the top priority. In addition, platforms such as Google, Amazon, and NV have also deployed the Deep Seek model. These changes have synchronously affected the market landscape of cloud services. A large number of domestic enterprises engaged in application development had to use Microsoft Cloud (for the convenience of using GPT-4 API) before the emergence of DeepSeeker R1; Now, using Deep Seek deployed on Alibaba Cloud has become an option. Furthermore, DeepSeek brought about the collapse of chip blocking logic. DeepSeek optimizes performance at the lower layers of the Huida CUDA ecosystem, such as the PTX layer, by fine-tuning the underlying code to address the issue of "castration of connectivity communication and scheduling capabilities". Experts generally believe that the current technological gap in the field of AI big models between China and the United States is about four months. From the trend of technological capability development, the probability of this gap continuing to narrow is greater than the probability of widening. Finally, open source allows DeepSeek to at least 'not fall behind' in the public opinion war. Making the most powerful models on the path of AGI and open sourcing them should be OpenAI's earliest intention and mission. The market never lies, whoever has strong modeling ability will have the ultimate say. When DeepSeek V3 was released in December 2024, international mainstream media mainly focused on its "low cost"; When Deep-Seek-R1 was released, the situation was completely different because of the "massive traffic" coming. As an open-source model, DeepSeek adds a "top expert" to all users in any field, free and available 24/7. (New Society)

Edit:He Chuanning Responsible editor:Su Suiyue

Source:People's Post and Telegraph

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Recommended Reading Change it

Links