AI reasoning chip inspires a new round of application innovation
2025-01-15
With the emergence of ChatGPT, competition in the field of artificial intelligence (AI) has entered a white hot phase. NVIDIA's high-end graphics processing unit (GPU) chips have skyrocketed and are highly sought after by major technology companies. At the same time, some startups have taken a different approach, focusing on developing another type of chip - AI inference chips, injecting new impetus into the flourishing development and application of AI products. According to a recent report by Physicist Organization Network, these AI inference chips aim to reduce the high computational costs required for generative AI and better meet the daily operational requirements of AI tools. The continuous decrease in cost and continuous improvement in performance of such chips are expected to trigger a new wave of AI application innovation, bringing more complex and powerful AI applications into millions of households. The demand for inference computing has skyrocketed, and training and inference are the solid cornerstones of the two core abilities of AI language models. During the application process, trained generative AI tools such as ChatGPT will absorb new information, infer from it, and generate responses, such as writing documents, generating images, etc. This type of AI tool can be applied in fields such as medical diagnosis, autonomous driving, and natural language understanding. With the widespread application of AI models, the demand for inference computing hardware is increasing, and the demand for inference chips will also skyrocket. According to a report by International Data Corporation (IDC), the proportion of AI servers on the inference side will continue to rise in the coming years. It is expected that by 2027, workloads used for inference will account for over 70%. Technology companies are competing to launch new products such as Cerebras, Groq, and d-Matrix, as well as traditional giants such as AMD and Intel, all of which have launched AI inference chips. These companies have keenly seized the opportunity for AI inference chips to showcase their capabilities. According to the official website of Cerebras, on August 28, 2024, the company launched an AI inference chip with the same name. This chip achieved a inference speed of 1800 tokens/second on the Llama 3.1-8B model; A inference speed of 450 tokens/second was achieved on Llama 3.1 70B, which is approximately 20 times faster than NVIDIA GPU inference speed. Token refers to the smallest unit or basic element of text processing by AI, such as a word, a character, etc. Cerebras explained that this outstanding performance is attributed to its innovative AI chip design solution. Its wafer level engine (WSE) is like a massive "computing factory", characterized by its astonishing size - a single chip occupies almost the entire area of a wafer. On this ultra large chip, the computing unit and memory unit are highly integrated, forming a dense grid structure. This design enables data to be transmitted between computing units and storage units over extremely short distances, fundamentally reducing data mobility costs and solving the memory bandwidth bottleneck that GPU inference cannot avoid. This type of large chip can process information faster, thus providing answers in a shorter amount of time. As early as February last year, Groq released its own AI inference chip, GroqCloud. It implemented a 250 token/second inference service on the Llama 3.1 70B model, which is almost an order of magnitude faster than GPU. On November 19th last year, Silicon Valley startup d-Matrix announced that its first AI inference chip Corsair had started shipping, aimed at providing services such as chatbots and video generation. Corsair enables the Llama3 8B model to achieve a processing capacity of 60000 tokens per second in a single server environment, with a latency of only 1 millisecond per token, fully demonstrating its outstanding performance in high-speed processing of large-scale data. It is worth mentioning that compared to GPUs and other solutions, Corsair can significantly reduce energy consumption and costs while providing the same performance. Application development has entered a new track, with tech companies such as Amazon, Google, Metaverse platform, and Microsoft investing heavily in buying expensive GPUs in order to take the lead in AI development. At the same time, AI inference chip manufacturers have set their sights on a broader customer base, hoping to shine in this new blue ocean. These potential customers include Fortune 500 companies that are eager to utilize emerging generative AI technologies but do not want to spend a lot of effort building their own AI infrastructure. Moreover, purchasing AI inference chips is cheaper than buying GPUs from companies such as Nvidia. AI inference chips are designed to optimize the speed and efficiency of inference calculations, particularly in fields such as intelligent suggestions, speech recognition, and natural language processing. Industry experts say that once the inference speed is increased to thousands of tokens per second, AI models will be able to complete the process of thinking and answering complex problems in the blink of an eye. This will not only achieve a qualitative leap in the interaction efficiency of existing applications, but also bring a series of refreshing human-computer interaction scenarios. For example, in the field of voice dialogue, latency will be compressed to the millisecond level, enabling an almost natural conversation experience; In the field of virtual reality/augmented reality, AI will be able to generate and adjust virtual environments, character dialogues, and interaction logic in real time, bringing personalized and immersive experiences to users. (New Society)
Edit:He Chuanning Responsible editor:Su Suiyue
Source:Sci-Tech Daily
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com