Physical artificial intelligence accelerates its entry into the real world
2026-06-16
From “visual comprehension” of images to “expression” through written text, to “imagination” in creating video visuals, AI is continually “evolving” in the present day. What else can AI do once it has mastered reading, writing, and generation? In recent years, the tech community has been exploring technological pathways to bridge the gap between AI in the virtual world and its application in reality.A new concept in technology – Physical AI – is gaining increasing attention. What are the differences between physical AI and generative AI, and which scenarios are they suitable for? How can we understand the relationship between physical AI and embodied intelligence, and what are the challenges in bringing this technology to fruition? The reporter interviewed relevant industry experts. Upgrades enable AI to evolve into intelligent entities capable of interacting with the physical world. What is physical AI? To put it simply, we can view it as an AI entity that has transcended the screen and entered the physical world. It can perceive its environment and interact with it in a manner similar to humans. Ma Xiaojian, the head of the Beijing General Artificial Intelligence Research Institute-Deta Intelligence Joint Laboratory, believes that physical AI has three key characteristics: its capabilities are based on real physical interaction data, it incorporates an understanding of the physical world, and it can be deployed in real-world entities. This means that physical AI understands the movement, contact, and deformation of objects, as well as friction, gravity, spatial relationships, and causal changes. It can use this knowledge to predict the future, plan actions, and thus complete tasks in an open environment. From a perspective of technological evolution, physical AI is a natural progression of AI as it reaches a certain stage of development.“In the first phase, AI learned ‘to see’ through computer vision. In the second phase, AI learned ‘to write’ through natural language processing. Currently, AI through physical means is learning ‘to act’.”According to Ying Ru, Director of Architecture at Baidu’s Intelligent Cloud, the emergence of physical AI has enabled AI to evolve from a tool for information processing into a sentient entity capable of interacting with the physical world.In the past, the big models mainly reproduced human language, knowledge and reasoning abilities, which were equivalent to entering human spiritual world. But human intelligence is not only reflected in the brain, but also in the interaction with the physical world.“Once AI achieves breakthroughs in language and multimodal understanding, the next step will undoubtedly be to externalize this intelligence into the real world, enabling machines to perceive, act, make mistakes, and complete tasks.”Ma Xiaojian said. The main differences between physical AI and generative AI lie in their technical principles and the tasks they are designed to perform. Specifically, one of the key capabilities of physical AI is its ability to handle tasks such as motion control and environmental interaction within the physical world. A key capability of generative AI is its ability to generate text, images, and videos, which supports tasks such as content creation, coding, and data analysis.“Physical AI and generative AI represent two distinct classification dimensions within the realm of AI.”Ma Xiaojian said the two are being deeply integrated. For example, the advanced language comprehension, scenario generation, planning, and code generation capabilities of generative AI can help physical AI better understand tasks and build simulation environments. Challenges. From models and data to the underlying infrastructure, there are numerous obstacles to the practical implementation of physical AI. Over the past few years, the tech community has taken multiple approaches, from core algorithms to engineering infrastructure, to facilitate the practical implementation of physical AI.For example, the world models used to simulate environmental dynamics and predict future states are referred to as the “inner brain” of physical AI.Academia experts have identified three key capabilities of a world model: generation, multimodality, and interactivity. These capabilities provide a framework for building physical AI that can understand its environment, make causal inferences, and plan tasks. Currently, the iterative evolution of large models that integrate vision, language, and action is laying a solid foundation for physical AI. Ma Xiaojian explained that currently, practical AI can be broadly categorized into three types of technical approaches. The first type is the "pre-training-post-training" paradigm, which first uses Internet video, first-view video, cross-robot operation data, etc., to conduct large-scale pre-training, and then relies on remote operation data, reinforcement learning, or real-machine fine-tuning to complete post-training. The second type is the "reality-simulation-reality" paradigm, which first reconstructs the real world's geometry, materials, dynamics and other information into a high-simulation environment, and allows the robot to make a lot of trial and error in a "digital twin" scene, before migrating to physical equipment. The third type is a large model programming route that relies on a language model to generate a robot control program based on the task, serializing functional modules such as perception, planning, and execution. Different technological implementation routes have advantages and disadvantages. For example, the "pre-train-post-train" paradigm path is clear, but the data quality, robot body consistency, and the volume of real interaction data are extremely demanding. Due to the fact that physical AI has not yet been widely implemented in everyday life and production, it is difficult to collect massive amounts of training data at a low cost and high efficiency, which has become one of the bottlenecks hindering the implementation of this technology. For example, the advantage of the "reality-simulation-reality" paradigm is that simulation compute power is an alternative to the costly and lengthy collection of real data. However, physical processes such as complex contact, flexible morphology, fluid motion, and uneven ground are still difficult to simulate in real time with high precision. "Due to the complexity of real-world work conditions and the interaction of multiple physical factors, simulation systems cannot completely reproduce physical details and are sometimes only a supplemental solution when real data is missing." Zhang Yu, general manager of Beijing Microlink Daai Technology Co., Ltd., said. "Overall, the three routes are likely not to replace each other, but to gradually converge at the level of data, simulation and big model inference." Ma Xiaojian said. What is the relationship between physical AI and embodied intelligence? To put it simply, embodied intelligence is a crucial medium for physical AI, and physical AI is the core technological approach for bringing embodied intelligence to life. However, there are still numerous challenges in the process of physically implementing embodied intelligence in AI, particularly at the level of hardware system engineering. For example, when a body intelligence performs a task, it needs to be adapted to a complex motion control algorithm. If the hardware accuracy is not up to standard, it is very easy to affect the depth coupling of hardware and software. Industry experts said that in recent years, China's domestic production level of core components of robots has increased significantly, but the processing accuracy of key components such as harmonic decelerators still lags behind the international advanced level. The future outlook: Leveraging the advantages of diverse application scenarios, physical AI continues to evolve. Despite facing challenges, industry experts generally view the prospects for the commercialization and implementation of physical AI positively. On one hand, the underlying logic of physical AI and large-model development is similar. By relying on larger-scale data collection, more powerful models, systematic evaluation, and continuous iteration, product capabilities will steadily improve. On the other hand, physical AI does not need to wait until universal robots have been fully developed before it can be commercialized. In vertical segmentation scenarios, as long as the model can demonstrate excellent generalization capabilities in a similar task, this is an important stage result. In the future, fields such as low-altitude economy, new energy batteries, embodied intelligence, high-end chips, and aerospace, which require complex scenario simulation and optimization, will be the areas where physical AI will be applied. Ma Xiaojian believes that physical AI is likely to be the first to be implemented in scenarios where human long-term operation is not suitable and where traditional automation cannot fully address the needs. An electrical inspection is one such scenario. In remote areas in southwest China, workers used to travel mountains and ridges to inspect equipment, and now the Tiangong robots developed by Beijing's humanoid robot innovation center can achieve complex tasks such as outdoor patrols, sub-circuit breaker operations, and wiring and earthline mounting.“Physical AI is not intended to replace all forms of automation.”Ma Xiaojian said that if tasks are highly disciplined and processes are fixed, traditional industrial automation is often cheaper and more stable. The real strength of physical AI lies in its ability to perform tasks that involve variable environments, require real-time perception, and flexible decision-making, while also incorporating repetitive or high-risk characteristics. In the industry, the efficiency of training physical AI models has also been steadily improving. “Thanks to our extensive experience in AI infrastructure, we have increased the training speed of our large models for “vision-language-action” by 70%, and reduced the latency of model inference worldwide by 50%.Training cycles, which used to be in weeks, can now be compressed to hourly levels.”Shen Zhu, executive vice president of Baidu Group and president of Baida's intelligent cloud business group, said. How can the implementation of physical AI be further advanced? Currently, physical AI is still in a phase of development where the technical approaches have not yet converged. "We want to encourage differentiated, multi-routes and parallel exploration." Ma Xiaojian believes that industrial policy and scientific research support should not cluster together a single technology hot spot, and should guide enterprises, universities and research institutes to pursue diversified research around models, control, simulation, sensors, dexterous hands, and ontological structures. This will both avoid the risk of betting on a single line of R & D, and help fill some of the shortcomings of China's whole-chain industry in the fields of algorithms, hardware, manufacturing and systems integration.Physical AI truly taking root doesn’t depend on laboratory demonstrations; instead, it relies on data feedback from real-world scenarios and continuous iteration.Industry insiders believe that the abundance of application scenarios is a unique advantage for China’s development of physical AI.“By enabling technology to permeate first-line settings such as mines, factories, warehouses, and inspection sites, physical AI can better facilitate a virtuous cycle of ‘scene-data-model-product’.”Ma Xiaojian said. (Looking ahead to a new era)