
With “embodied intelligence” and “intelligent robots” being included in the government work report for the first time in 2025, many new concepts and terms such as embodied intelligence, intelligent robots, and humanoid robots have increasingly appeared in the public eye. These seemingly obscure tech buzzwords may soon enter commercial scenarios and even our homes. What are their definitions? How are they related to each other? What is their future development prospect? The “Expert Talk” column of the China Academy of Information and Communications Technology (CAICT for short) has launched a special series on “Robots”, inviting a number of senior industry experts to discuss the development situation of China’s artificial intelligence + robotics from multiple perspectives. In this issue, Zhang Weimin, Deputy Director of the Security and Embodied Intelligence Department of the Artificial Intelligence Research Institute of CAICT, is invited to conduct an in-depth interpretation on topics such as how to understand embodied intelligence, the core value of embodied intelligence in empowering new-type industrialization, the current stage of the intelligence level of embodied intelligence, as well as the technical bottlenecks and solutions in moving towards large-scale application.
How to Understand Embodied Intelligence?
Embodied intelligence is a theory and paradigm of intelligence that emphasizes that intelligent behavior depends on the interaction between the body and the environment, and has distinct physical and situational characteristics. Robots, on the other hand, serve as the carrier and testbed for this theory in engineering practice. Both intelligent robots and humanoid robots are branches of embodied intelligence.
How Does Embodied Intelligence Reshape the Core Links of the Manufacturing Industry?
Previously, robots have gradually broken free from the constraints of fixed mechanical operations with the help of artificial intelligence technology, and their automated operation capabilities have been improved. However, they are mainly applicable to structured and deterministic scenarios. When facing complex and changeable production environments, the problems of insufficient flexibility and poor adaptability remain prominent. The development of embodied intelligence enables physical entities such as robots to independently perceive, plan, and execute tasks, and equips them with flexible operation capabilities for multiple functions and tasks. A typical example of this transformation is seen in the TV manufacturing sector: the 4 – Axis Robotic TV Panel Screw – Fixing System, integrated with embodied intelligence, has become a benchmark for flexible production upgrades. This system is equipped with a high-resolution visual guidance module that can autonomously identify the position, spacing, and depth of screw holes on TV panels (ranging from 32-inch to 85-inch models) by analyzing real-time environmental data—overcoming the limitation of traditional fixed robots that require pre-programmed coordinates for single-model production. Its 4-axis robotic arm adjusts screwing torque (0.5-5 N·m) and movement trajectory in milliseconds based on feedback from force sensors, avoiding panel deformation caused by excessive force while ensuring that each screw meets fastening standards. Compared with manual operations (which have a defect rate of 8-10%) or traditional mechanical systems (which take 1.5 hours to switch between models), this system reduces the defect rate to less than 0.3% and shortens model changeover time to 10 minutes. It can be said that embodied intelligence provides a new solution to the two major problems of “insufficient flexibility and frequent production line changes” in industrial production, and also meets the development needs of China’s manufacturing industry for the new manufacturing model of small batches and multiple varieties.
What Stage is the Intelligence Level of Embodied Intelligence in?
The level of intelligence is mainly determined by the capability of models. The end-to-end Vision-Language-Action (VLA) model, which is a key direction of exploration, is still in the “kindergarten” stage. Breaking down the three important capabilities of VLA: first, the ability to perceive the world through vision—currently, it can only identify what objects exist, but cannot understand where to apply force or how much force to apply. Second, the ability to communicate with humans through language—it is more capable of supporting clear, specific, and structured task instructions; for example, it can understand “pick up an apple”, but it is difficult for it to understand “give me a fruit”. Third, the ability to execute tasks through actions—currently, it can only combine simple skills such as moving, grabbing, and placing, and the objects it operates on are mainly rigid items.
The performance ceiling of intelligence in actual tasks is determined by factors such as ontology performance and network communication. On one hand, ontology technology still needs further refinement. Taking humanoid robots as an example, the current technical maturity is still in the early stage. The humanoid robot marathon with a 30% completion rate has exposed hardware limitations such as motor heating, joint reliability, and the structural stability of the ontology. On the other hand, shifting from executing fixed programs to adapting to scenarios for task execution requires comprehensively considering factors such as the robot’s computing power, network, cost, and energy consumption. To build distributed and generalizable embodied intelligence, it is necessary to balance the supply costs of the three elements—hardware, network, and computing power—so that it can potentially break into industrial production lines and enter ordinary households.

What Are the Technical Bottlenecks Faced by Embodied Intelligent Robots in Moving Towards Large-Scale Application?
First, there is uncertainty regarding model scalability. At present, there is insufficient evidence to prove that expanding data is absolutely useful for improving the generalization of models. Moreover, enabling models to be both generalizable and sufficiently reliable is a systematic project, involving data, training methods, model architecture, and other aspects.
Second, there is a severe shortage of practically usable action data. Existing embodied intelligence datasets—whether open-source, synthetic, or collected from real robots—are difficult to use on a large scale. Data is strongly bound to the ontology and highly related to the production environment, resulting in a serious phenomenon of data silos. When a model trained on one dataset is deployed to different ontology models, its performance will be greatly reduced; even for the same ontology model, its performance will vary completely in different laboratories or built environments.
Third, the problem of full-body motion control needs to be solved. The coordinated control of all body joints requires completing complex calculations in a high-dimensional action space within tens of milliseconds, which places high requirements on the control performance of joint motors, as well as the accuracy and real-time performance of motion control models. In addition, there are still significant challenges in handling operation tasks that require precise force control and rich contact, such as organizing wire harnesses and packaging plastic bags.
How to Solve the Industrialization Problems of Embodied Intelligence?
The development of embodied intelligence, especially humanoid robots, requires patient long-term investment and key layout. The following suggestions are put forward:
First, focus on interdisciplinary research, including computer science, control science, cognitive science, robotics, and other fields. Only by strengthening communication and cooperation among different disciplines and integrating the advantageous resources of each discipline can we jointly overcome the key scientific issues and technical problems in the field of embodied intelligence. The China Artificial Intelligence Industry Development Alliance (AIIA) has established an Embodied Intelligence Working Group, which aims to build a cooperation and exchange platform for all parties. Centering on work tasks such as evaluation standards, industrial ecosystem construction, supply-demand docking for application promotion, and promoting application through competitions, it promotes collaboration among all parties in the embodied intelligence industrial ecosystem and accelerates the industrialization process of China’s embodied intelligence.
Second, attach importance to standard development and evaluation feedback mechanisms to clarify the direction of intelligent capability upgrading. Through scientific, efficient, and accurate standardization research, the industrial technology upgrading and large-scale application can be effectively promoted. The Embodied Intelligence Working Group of MIIT TC1 (the National Technical Committee for Intelligent Standardization of the Ministry of Industry and Information Technology) WG6 comprehensively advances the construction of China’s embodied intelligence industry standard system from four aspects: system R&D support, system intelligent technology, system integration, and system application. It systematically promotes standardization work focusing on embodied intelligence grading, dataset quality, interfaces, training grounds, benchmark testing, and key products such as humanoid robots.
Third, strengthen collaboration with the upper and lower reaches of the industrial chain. First, promote the realization of a “data-model-ontology” closed loop: accumulate data through the ontology, drive the iterative upgrading of models, and further strengthen the performance leap of the ontology. Second, form a “demand-driven-application verification-feedback iteration” closed loop: focus on semi-structured scenarios in key industries such as manufacturing, logistics, medical care, and households, and promote pilot demonstrations of embodied intelligence. Third, deepen the ecological collaboration mechanism: build a closed loop of “infrastructure-technical services-product services-industry applications”. Through industry alliances, standardization organizations, and other platforms, establish an industrial chain collaboration mechanism to promote the synchronous evolution of technology, standards, and applications.
What benefits does artificial intelligence bring to component assembly machines?