Nowadays, with the booming development of big models, are there any indicators to measure the "intelligence level" of AI big models? Recently, a research team from Tsinghua University proposed the density rule for large models, and the related paper was published in the Nature Journal of Machine Intelligence. The density rule reveals that the maximum power density of large language models increases exponentially over time, doubling approximately every 3.5 months from February 2023 to April 2025. The "Moore's Law" in the field of computer science is well-known: the number of transistors that can be accommodated on a chip doubles every once in a while. The power of computers is not because chips have become as big as houses, but because they integrate astronomical computing units on an area the size of a fingernail. Xiao Chaojun, assistant researcher at the Department of Computer Science and Technology at Tsinghua University, told Science and Technology Daily reporters that there should also be an indicator for the intelligence level of large models, which is "ability density". The core assumption of the study is that different sized models with the same manufacturing process and sufficient training have the same capability density. Just as the chip industry has achieved miniaturization and inclusiveness of computing devices by increasing circuit density, large models are also achieving efficient development by increasing capacity density. Xiao Chaojun said that in the past, guided by the rule of scale, people cared about the size (parameter quantity) of a large model, and the larger it was, the more intelligent it was, just like caring about the weight of a weightlifter, the heavier the weight, the greater the strength. Now, the density law reveals the law of "efficient development" of large models from another perspective - we should pay more attention to its "ability density", that is, how much "intelligence" is contained in each unit of "brain cell" (parameter). It's like evaluating a martial arts master, it's not about how muscular they are, but how much power they contain in each move, "Xiao Chaojun said. The research team conducted a systematic analysis of 51 open-source large models released in recent years and discovered an important pattern: the maximum capacity density of large models increases exponentially over time, doubling on average every 3.5 months since 2023. This means that with the collaborative development of "data computing algorithm", the same level of intelligence can be achieved with fewer parameters. The team also provided some inferences. For example, the inference overhead of models with the same ability decreases exponentially over time, while the density of large model capabilities is accelerating. Before the release of ChatGPT, the ability density doubled every 4.8 months, while after the release of ChatGPT, the ability density doubled every 3.2 months, with a 50% increase in density enhancement speed. This indicates that with the maturity of big model technology and the prosperity of the open source ecosystem, the improvement of capability density is accelerating. Xiao Chaojun stated that intuitively, the higher the capability density, the smarter the large model, the less computing power is required to run it, and the lower the cost. Based on this scientific guidance, both academia and industry can engage in multidimensional technological innovation, making big models increasingly accessible. From the perspective of the application of large-scale models, the density rule also means that AI is becoming increasingly available. Xiao Chaojun introduced that with the continuous increase of chip circuit density (Moore's Law) and model capability density (density law), large models that could only be deployed in the cloud before will be able to be installed and run with terminal chips in the future. The big model runs on terminal devices and has inherent advantages in response speed, user privacy, and can do more for users. Xiao Chaojun gave an example. Previously, the application of big models in smart cars was passive services such as "helping me open the car window" and "helping me check nearby restaurants". After the end side model "gets on the car", it can achieve multimodal perception fusion and active decision-making closed-loop of the cabin and exterior environment through rich "cabin and exterior perception" and "intention understanding" capabilities, driving the intelligent cabin from "passive response" to "active service", and allowing intelligence to infiltrate every driving experience. (New Society)
Edit:Momo Responsible editor:Chen zhaozhao
Source:Science and Technology Daily
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com