Exploring the Progressive Development Path of Embodied Intelligence from L0 to L4
Guest Speaker: Jin Ge, Founder and CEO of Lingyu Intelligence, with a Bachelor's degree in Automation from Tsinghua University and an MBA from Tsinghua School of Economics and Management. Previously a managing partner at Yuanjing Venture Capital and Vice President of Aoliang Photonics. With years of successful investment and entrepreneurial experience in high-tech fields, he has invested in and incubated multiple early-stage hard-tech enterprises. Lingyu Intelligence is a cutting-edge embodied intelligence company focusing on human-machine hybrid intelligence, established by top motion control experts from Tsinghua University's Automation Department, with the mission of "creating a benchmark for practical embodied intelligence and liberating humans from 'dangerous, heavy, and boring' work".
Key Keywords: #RemoteOperation #Human-MachineHybridIntelligence #MAAS #ArmHandIntegratedControl
Jin Ge observed that currently, embodied intelligence faces a Blockchain Trilemma, where generality, performance, and autonomy are difficult to balance with current and even the next 3-5 years' technological levels. Generality means robots are not scenario-specific but can perform various tasks; performance refers to reliability (success rate of task completion) and efficiency (comparison of robot performance to human performance); autonomy indicates whether human intervention is required or if the robot can complete tasks independently.
Jin Ge then introduced two common methods to enhance robot autonomy:
• The first approach aims to directly reach L4, improving task success rates to over 99.9% and achieving full autonomous operation across multiple scenarios. However, this path is time-consuming and costly. Currently, robot data is extremely scarce, requiring extensive real-machine data and resources to train an AGI robot meeting human expectations.
• The second approach follows the autonomous driving progression, gradually advancing from L0 to L2 and then L4 by deploying robots in commercial use, continuously collecting interaction data, and incrementally upgrading the intelligence system. The primary advantage is that enterprises can address data shortcomings while generating early revenue.
Following the second approach, Jin Ge believes the most economically viable method is establishing a MAAS (Manipulation AS a Service) platform, where robots autonomously control simple daily scenarios. When encountering complex or dangerous situations, robots will call human operators or cloud-based "human-like" models for remote operation to complete subsequent tasks. This method enables one-to-many robot management, enhancing robot autonomy while better meeting personalized user needs.
[The rest of the document follows the same translation approach, maintaining the original structure and translating all non-tagged text to English.]Q8: For companies developing embodied large models, is tactile data a rigid demand?
Lv Liyun: For robots, their brain needs to make decisions based on tactile sensor input, such as perceiving object temperature, humidity, texture, etc., which cannot be sensed without touch. Without perception, it would affect the robot's subsequent interactive actions and responses. To give a more specific example, when facing two objects that look similar, like an egg and a hammer, visual perception alone cannot distinguish between them without tactile sensing.
Q9: In industrial scenarios, robots might trigger safety responsibility disputes. In remote operation systems, how is responsibility distinguished between humans and machines?
Jin Ge: Robots' interaction capabilities in the physical world can lead to a series of unpredictable consequences, potentially causing property and personal damage. I have been calling for embodied intelligence to establish a mandatory insurance standard for robots, similar to cars. Currently, the industry is in an early stage and hasn't spent much time defining responsibility and mandatory insurance.
Q10: L4-level embodied intelligence requires high-cost investment. How do enterprises balance technological advancement with commercial return cycles?
Jin Ge: If enterprises follow a progressive path from L0 to L2 and then to L4, they can generate commercial benefits during the gradual upgrade process. If taking a direct path to L4, it requires continuous high investment and support from the capital market. We indeed see many companies on this path raising massive funds, with this balance achieved more through capital market assistance rather than by the enterprises themselves.
Q11: When do you predict embodied intelligence will move from L3 to L4? Are there any landmark events that can serve as observation indicators?
Jin Ge: The core issue is how to define a robot's L4 - whether it means a general artificial intelligence that can replace a human in doing all work anywhere, or if it means achieving autonomous work 99% of the time in vertical scenarios. I believe achieving the latter would actually meet our current expectations for robots, which might be possible within a decade. A landmark event would be seeing numerous commercial scenarios become fully unmanned and robot-serviced, with training ends obtaining massive data.
Q12: Can emotional companionship needs in home scenarios support consumer-end robot popularization? What are the key factors for consumer-end robot popularization?
Jin Ge: Emotional companionship needs can be divided into two categories. The first is pure language-based emotional companionship, which doesn't require high physical robot capabilities and is more based on large language model abilities. Its difficulty depends on efforts from language models like DeepSeek, and such consumer-end robots will likely popularize early as expected. The second category seeks physical interaction with robots, involving full-body control and operation. However, its popularization critically depends on ethics, compliance, and safety, so we shouldn't be overly optimistic about its current-stage popularization.