Sci-Tech

The proportion of Chinese data used for training most AI models in China has exceeded 60%

2025-08-22   

Chinese data plays an important role in improving the training performance of domestic AI models. According to data recently released by the National Data Administration, the proportion of Chinese data used for training most AI models in China has exceeded 60%, with some models reaching 80%. The development and supply capacity of high-quality Chinese data continues to increase, promoting the rapid improvement of China's AI model performance. Liu Liehong, Director of the National Data Administration, said that the rapid development of artificial intelligence in China is inseparable from the fact that China attaches great importance to data work. As one of the core elements in the development of artificial intelligence, data plays a crucial role in promoting the process of "AI+", and the construction of high-quality datasets is of utmost importance. In the era of artificial intelligence, Token, That is to say, word element is the smallest data unit for processing text, just like what people call "traffic" in the Internet era. ”Liu Liehong introduced that at the beginning of 2024, the average daily consumption of Token in China was 100 billion yuan. By the end of June this year, the average daily consumption of Token had exceeded 30 trillion yuan, an increase of more than 300 times in a year and a half, reflecting the rapid growth of China's AI application scale. It is reported that as of the end of June this year, China has built over 35000 high-quality datasets with a total volume of over 400PB (1PB can store about 500 million 2MB high-definition photos), which is equivalent to about 140 times the total digital resources of the National Library of China. The training of artificial intelligence models has also driven the increasing demand for data trading. As of the end of June this year, the cumulative transaction volume of high-quality datasets in various regions has reached nearly 4 billion yuan, and the total scale of high-quality datasets listed by data trading institutions has reached 246PB. Next, the National Data Administration will continue to promote the construction of high-quality datasets through systematic layout, accelerate the creation of data highlands in key areas such as embodied intelligence, low altitude economy, and biomanufacturing, promote the strengthening of data element value recognition in the whole society, accelerate the co creation of data element value, and cultivate market consensus of "paying for high-quality data". (New Society)

Edit:Momo Responsible editor:Chen zhaozhao

Source:Xinhua News Agency

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Recommended Reading Change it

Links