Bridging the Data Gap in Embodied Intelligence: Lumos’ Innovative Approach with FastUMI Pro

The Future of Embodied Intelligence: How Data Collection is Paving the Way

In the rapidly evolving landscape of embodied intelligence, one point stands out among experts: the critical importance of data. Unlike large language models, which thrive on vast text corpora, our physical world lacks an equivalent reservoir of real-world interaction data. This scarcity poses challenges for both human and robotic systems aiming to achieve the next level of performance akin to the breakthroughs seen in technologies like GPT.

The Data Dilemma

For companies venturing into embodied intelligence, several pressing questions emerge: How is data collected? What scale is needed? And how do we ensure quality? These questions aren’t merely academic; they are increasingly defining the prospects for companies operating in this space.

One startup tackling this head-on is Lumos, founded in September 2024. Lumos is honing in on the first link in the data collection chain by introducing an innovative product: the FastUMI Pro, a backpack-mounted universal manipulation interface device designed to standardize data collection in various real-world environments.

Understanding the Solution: FastUMI Pro

The concept of a Universal Manipulation Interface (UMI), pioneered by researchers from Stanford, Columbia, and the Toyota Research Institute, seeks to enable low-cost, high-efficiency data collection that can be generalizable across different robotic systems. Unlike traditional teleoperation methods—where data is bound to specific robot hardware—UMI allows for data acquisition that can be applied across various robotic morphologies.

Lumos’ FastUMI Pro significantly enhances data collection efficiency. “For tasks like folding clothes, teleoperation data collection takes about 50 seconds and costs RMB 3–5 (USD 0.42–0.70),” said Lumos founder and CEO Yu Chao. “Using FastUMI Pro, it takes just ten seconds, and costs less than RMB 0.6 (USD 0.08).” This leap in efficiency not only reduces costs but also opens doors for more frequent data collection.

Scaling Data Collection Strategies

In 2025, Lumos established a dedicated data collection center, boasting a production capacity of 100,000 hours of data annually. However, Yu envisions a future where leading embodied AI models require at least one million hours of training data by 2026. This ambitious target necessitates a shift from centralized data storage facilities to a more distributed model that captures data in everyday settings.

As Yu aptly points out, “Robot training data should not be this expensive or scarce. Humans generate data constantly while working in the physical world. It’s everywhere. It just hasn’t been properly collected.”

Targeting Diverse Environments

Lumos is strategically deploying 10,000 FastUMI Pro units across six environments—industrial sites, homes, hotels, restaurants, shopping malls, and offices. The aim is to develop a rich, structured operational dataset that captures a wide array of task categories across these diverse settings.

Historically, data collection for embodied robotics has been confined to controlled environments, leading to datasets that often lack the diversity necessary for robust model training. By miniaturizing its toolkit, Lumos seeks to lower barriers and democratize data collection, enabling a broader spectrum of operational data.

The Integrated Loop: Collection, Training, and Deployment

At the core of Lumos’ system is a cohesive loop connecting data collection, training, and deployment. Using the FastUMI Pro, Lumos reported that its dual-arm robot, Mos, completed a factory quality inspection workflow—including data collection and policy training—in just five hours. When deployed in a real-world setting, this was accomplished in seven hours, showcasing FastUMI Pro’s effectiveness in operationalizing data.

Additionally, Lumos has introduced what it calls a “data supermarket,” where standardized portions of its datasets can be purchased, further enhancing data accessibility for developers.

Reframing Priorities: Focus on Data Infrastructure

While other companies may prioritily model architecture, Lumos is placing its bets on building a robust data infrastructure. As co-CTO Ding Yan noted, the success of any strong model hinges on a reliable data pipeline for production, evaluation, and filtering—an initiative that requires time and effort.

The Road Ahead for Embodied Intelligence

The challenges surrounding embodied intelligence will no doubt continue to evolve. However, what remains clear is that the potential for embodied systems hinges on the availability and diversity of real-world operational data. If Lumos succeeds in shifting the paradigm from centralized data collection to a distributed model, the implications for the field could be transformational.

By making operational data more abundant and standardized, Lumos could indeed bridge the gap between controlled demonstrations and real-world applications, unlocking the full potential of embodied intelligence.

As we look to the future, one question remains: Will Lumos’ approach become the gold standard in data collection, revolutionizing how we build and train embodied AI systems? Only time will tell, but their innovative methodologies are undoubtedly setting the stage for an exciting new chapter in the world of robotics.

Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

This Company Has Developed a Backpack System for Collecting Robotics Data

Bridging the Data Gap in Embodied Intelligence: Lumos’ Innovative Approach with FastUMI Pro

The Future of Embodied Intelligence: How Data Collection is Paving the Way

The Data Dilemma

Understanding the Solution: FastUMI Pro

Scaling Data Collection Strategies

Targeting Diverse Environments

The Integrated Loop: Collection, Training, and Deployment

Reframing Priorities: Focus on Data Infrastructure

The Road Ahead for Embodied Intelligence

Latest

Target’s Roundel Explains Why It’s Among the First to Experiment with ChatGPT Ads

How AI is Revolutionizing Document Processing and PDF Workflows

Claims that AI Can Address Climate Change Rejected as Greenwashing | AI (Artificial Intelligence)

UK Sets Its Sights on All AI Chatbots Following Grok Controversy

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

VOXI UK Launches First AI Chatbot to Support Customers

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

China Welcomes Lunar New Year with Robots

Just Eat Introduces Delivery Robots in Bristol

How Nomad Foods is Embracing the Future of Robotics and AI

Popular categories

Most recent

Target’s Roundel Explains Why It’s Among the First to Experiment with ChatGPT Ads

How AI is Revolutionizing Document Processing and PDF Workflows

Claims that AI Can Address Climate Change Rejected as Greenwashing | AI (Artificial Intelligence)

Most popular

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Subscribe