Bridging the Data Gap in Embodied Intelligence: Lumos’ Innovative Approach with FastUMI Pro
The Future of Embodied Intelligence: How Data Collection is Paving the Way
In the rapidly evolving landscape of embodied intelligence, one point stands out among experts: the critical importance of data. Unlike large language models, which thrive on vast text corpora, our physical world lacks an equivalent reservoir of real-world interaction data. This scarcity poses challenges for both human and robotic systems aiming to achieve the next level of performance akin to the breakthroughs seen in technologies like GPT.
The Data Dilemma
For companies venturing into embodied intelligence, several pressing questions emerge: How is data collected? What scale is needed? And how do we ensure quality? These questions aren’t merely academic; they are increasingly defining the prospects for companies operating in this space.
One startup tackling this head-on is Lumos, founded in September 2024. Lumos is honing in on the first link in the data collection chain by introducing an innovative product: the FastUMI Pro, a backpack-mounted universal manipulation interface device designed to standardize data collection in various real-world environments.
Understanding the Solution: FastUMI Pro
The concept of a Universal Manipulation Interface (UMI), pioneered by researchers from Stanford, Columbia, and the Toyota Research Institute, seeks to enable low-cost, high-efficiency data collection that can be generalizable across different robotic systems. Unlike traditional teleoperation methods—where data is bound to specific robot hardware—UMI allows for data acquisition that can be applied across various robotic morphologies.
Lumos’ FastUMI Pro significantly enhances data collection efficiency. “For tasks like folding clothes, teleoperation data collection takes about 50 seconds and costs RMB 3–5 (USD 0.42–0.70),” said Lumos founder and CEO Yu Chao. “Using FastUMI Pro, it takes just ten seconds, and costs less than RMB 0.6 (USD 0.08).” This leap in efficiency not only reduces costs but also opens doors for more frequent data collection.
Scaling Data Collection Strategies
In 2025, Lumos established a dedicated data collection center, boasting a production capacity of 100,000 hours of data annually. However, Yu envisions a future where leading embodied AI models require at least one million hours of training data by 2026. This ambitious target necessitates a shift from centralized data storage facilities to a more distributed model that captures data in everyday settings.
As Yu aptly points out, “Robot training data should not be this expensive or scarce. Humans generate data constantly while working in the physical world. It’s everywhere. It just hasn’t been properly collected.”
Targeting Diverse Environments
Lumos is strategically deploying 10,000 FastUMI Pro units across six environments—industrial sites, homes, hotels, restaurants, shopping malls, and offices. The aim is to develop a rich, structured operational dataset that captures a wide array of task categories across these diverse settings.
Historically, data collection for embodied robotics has been confined to controlled environments, leading to datasets that often lack the diversity necessary for robust model training. By miniaturizing its toolkit, Lumos seeks to lower barriers and democratize data collection, enabling a broader spectrum of operational data.
The Integrated Loop: Collection, Training, and Deployment
At the core of Lumos’ system is a cohesive loop connecting data collection, training, and deployment. Using the FastUMI Pro, Lumos reported that its dual-arm robot, Mos, completed a factory quality inspection workflow—including data collection and policy training—in just five hours. When deployed in a real-world setting, this was accomplished in seven hours, showcasing FastUMI Pro’s effectiveness in operationalizing data.
Additionally, Lumos has introduced what it calls a “data supermarket,” where standardized portions of its datasets can be purchased, further enhancing data accessibility for developers.
Reframing Priorities: Focus on Data Infrastructure
While other companies may prioritily model architecture, Lumos is placing its bets on building a robust data infrastructure. As co-CTO Ding Yan noted, the success of any strong model hinges on a reliable data pipeline for production, evaluation, and filtering—an initiative that requires time and effort.
The Road Ahead for Embodied Intelligence
The challenges surrounding embodied intelligence will no doubt continue to evolve. However, what remains clear is that the potential for embodied systems hinges on the availability and diversity of real-world operational data. If Lumos succeeds in shifting the paradigm from centralized data collection to a distributed model, the implications for the field could be transformational.
By making operational data more abundant and standardized, Lumos could indeed bridge the gap between controlled demonstrations and real-world applications, unlocking the full potential of embodied intelligence.
As we look to the future, one question remains: Will Lumos’ approach become the gold standard in data collection, revolutionizing how we build and train embodied AI systems? Only time will tell, but their innovative methodologies are undoubtedly setting the stage for an exciting new chapter in the world of robotics.