Home
Loading

aVenture is in Alpha: During this preview period, you should expect the research data to be limited and may not yet meet our exacting standards. We've made the decision to provide early access to our data to showcase the product as we build, but you should not yet rely upon it alone for your investment decisions.

aVenture is in Alpha: During this preview period, you should expect the research data to be limited and may not yet meet our exacting standards. We've made the decision to provide early access to our data to showcase the product as we build, but you should not yet rely upon it alone for your investment decisions.

Get in touch

  • Contact

  • Request a demo

  • Request data updates

  • Add a company

Research

  • Companies

  • Investors

  • People

aVenture

  • Sitemap

  • Feature requests

Member

Backed by

© aVenture Investment Company, 2026. All rights reserved.

San Francisco, CA, USA

Privacy Policy

aVenture Investment Company ("aVenture") is an independent research platform providing detailed analysis and data on startups, venture capital investments, and key industry individuals. It is not a registered investment adviser, broker-dealer, or investment advisor and does not provide investment advice or recommendations. The data provided by aVenture does not constitute recommendations or advice, whether by methodology, analysis, AI-generated content, or a statement written by a staff member of aVenture.

aVenture is not affiliated with any of the people, companies, organizations, government agencies, regulatory bodies, or investment funds we provide coverage for on this site unless explicitly stated otherwise. Users assume full responsibility for decisions made based on information obtained from this platform. Links to external websites do not imply endorsement or affiliation with aVenture. Any links that provide the ability to invest in a primary or secondary transaction in a company are for convenience only and do not constitute solicitations or offers to buy or sell an investment. Investors should exercise heightened precaution and due diligence when investing in private companies, especially those not independently audited.

While we strive to provide valuable insights with objectivity and professional diligence, we cannot guarantee the accuracy of the information provided on our platform. Before making any investment decisions, you should verify the accuracy of all pertinent details for your decision. To the fullest extent permitted by law, aVenture shall not be liable for any direct, indirect, incidental, consequential, or financial damages arising from use of this site, whether by consumers of its contents directly or by persons or organizations covered by our research, even if we are advised of the possibility. Our best-efforts processes and correction request forms do not create a warranty or duty of care.

Profiles on this platform may include content generated in part by large language models (LLMs, artificial intelligence) that aggregate publicly available sources (e.g., SEC EDGAR, public filings, press releases). Source attribution is provided where known; always verify statements and claims here against original sources before relying on any data. Content on our site may contain inaccuracies, omissions, or what are commonly called 'hallucinations' if generated in part or in full by AI / LLMs. The risk can also exist even when content is written by a human, as internal and third-party sources may also have inaccuracies for the same or different reasons. While we randomly audit a proportion of content, this is not exhaustive.

We recommend that an independent auditor be hired to verify the accuracy of the information before relying on it for any sensitive decisions. By accessing this platform, you agree not to rely solely on any information generated by AI, aggregated, or sourced or written otherwise on this site, for investment, financial, or other decisions. aVenture assumes no responsibility for inaccuracies, omissions, or hallucinations. You must independently verify all data from primary sources. Use of this platform constitutes your waiver of claims for reliance-based damages, including negligent misrepresentation. To report an error, request a correction, or dispute information about a company or individual, contact us via our request data updates form.

Loading
Loading
Home
News
Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it.

From TechCrunch

By Tim Fernholz

June 17, 2026

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it.

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it.

Two weeks ago, OpenAI said it would relaunch the robotics program it shuttered in 2021 — the latest signal that the biggest AI labs are racing to teach machines to operate in the physical world. But building capable robots requires something the AI industry doesn’t yet have, which is the training data to match that used for language models.

That gap is creating a new kind of infrastructure business. Unlike LLMs that were trained on a vast sea of publicly available text, robots need data that captures physical interaction, and that kind of data barely exists. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the physical world.

XDOF (pronounced “ecks-doff”), emerging from stealth today, is betting that the next great bottleneck in AI isn’t models or chips, but the data feedback loop needed to teach robots how to interact with the physical world.

The startup aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can’t easily build themselves — and has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to do it. Co-founder and CEO Philipp Wu says XDOF, which has about 60 employees, is already working with 20 customers, including several frontier AI labs, but cannot name them.

“All of the top labs are trying to pursue robotics,” Wu said. “We’ve already seen some of the downfalls of falling a little bit behind in the language model race … you don’t want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier.”

Wu ran into this problem himself as a PhD student at UC Berkeley. His focus was on enabling robots to learn skills from large-scale datasets. There was just one problem.

“We didn’t have large-scale data to work with,” he told TechCrunch. “There was this chicken-and-egg problem — we first needed to actually collect data before we could even ask how to train a foundation model for robotics.”

Wu and his future XDOF co-founder and CTO, Fred Shentu, worked on a project called GELLO, a low-cost teleoperation system that lets a human operator control a robotic arm to generate training data. “It ended up becoming a very influential paper in robotics, because a lot of people had similar needs and bottlenecks, and many started leveraging this type of device for data collection,” Wu said.

Spotting the opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 2024 to provide a data ecosystem for companies pursuing robotics models. Mindful that data provision alone can be a dead-end business, the company is also focused on data cleaning, tooling, and annotation — creating a self-reinforcing feedback loop for robot trainers.

As a starting point, the company is partnering with UC Berkeley’s AI Research lab to release what it believes is the largest collection of high-quality robot training data ever assembled, dubbed ABC. It includes 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations. That kind of scaled-up pre-training data has never been available to academia before.

“We’ve seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn’t necessarily have expected,” David McAllister, a Berkeley PhD student who helped organize the release, told TechCrunch.

The team has already used the data to train robots on benchmark tasks like folding T-shirts and flattening boxes, or loading AirPods into their cases.

Unlimited degrees of freedom

The company plans to work across three tiers of a data pyramid. The most valuable tier is teleoperation data collected on the actual robot being deployed; next comes teleoperated robots gathering more general data, as with GELLO; and finally “egocentric” data gathered by humans performing everyday tasks, for which XDOF plans to build its own wearable sensors.

“Your camera choice is going to affect the quality of your data — which is going to affect how your hand-tracking algorithm performs,” Wu said. “If you don’t design the hardware well from the start, the data you collect might have very specific problems that you didn’t anticipate.”

The company plans to hire and train armies of teleoperators and egocentric data operators around the world — a labor-intensive model that raises an obvious question: Why aren’t the major labs doing this data production work themselves?

“You need a warehouse of hundreds of thousands of square feet with hundreds of robots,” Wu said. “You need to maintain these robots, calibrate their physical parameters, and properly train operators.”

It’s a build-out that requires focus, capital, and operational scale that most AI labs would rather outsource — which is precisely the market XDOF is betting on.

The name XDOF is a play on the robotics term “degrees of freedom,” which describes the number of independent motions a robot can perform. Your arm, from shoulder to wrist, has seven degrees of freedom. Humanoid robotics company Figure AI’s latest robot has 30. The X in the company’s name captures its ambition: “Arbitrary degrees of freedom, unlimited degrees of freedom,” Wu says.

View original article on techcrunch.com

Most Recent

Founders Fund’s outlier bet on humanely killed fish

Founders Fund’s outlier bet on humanely killed fish

Shinkei makes a refrigerator-sized robot called Poseidon to kill fish quickly and humanely.

Jun 20, 2026

He made your free video player run smoothly. Now he’s doing that for robots.

He made your free video player run smoothly. Now he’s doing that for robots.

French serial entrepreneur and open-source legend Jean-Baptiste Kempf has been building Kyber, an infrastructure layer to control remote devices in real time.

Jun 19, 2026

AI inference startup Baseten reportedly raising $1.5B months after its last mega-round

AI inference startup Baseten reportedly raising $1.5B months after its last mega-round

Startup Baseten is reportedly close to finalizing a $1.5 billion round at a $13 billion as the “inference gold rush" marches on.

Jun 18, 2026

The 11 standout startups from YC’s Demo Day, according to VCs

The 11 standout startups from YC’s Demo Day, according to VCs

TechCrunch spoke to investors to find the hottest startups in the Spring 2026 YC batch. Some of them commanded valuations of over $175 million, VCs said.

Jun 18, 2026

Similar Posts

Alloy is bringing data management to the robotics industry

Alloy is bringing data management to the robotics industry

Australia-based Alloy thinks it can help robotics firms with their data problem: The startup is building data infrastructure to help companies process and organize all the data their robots collect.

Sep 23, 2025

Coco Robotics taps UCLA professor to lead new physical AI research lab

Coco Robotics taps UCLA professor to lead new physical AI research lab

Coco Robotics is working toward automating its fleet of delivery robots using its millions of miles of collected data.

Oct 14, 2025

General Catalyst and Khosla Ventures back data mapping startup Lume

General Catalyst and Khosla Ventures back data mapping startup Lume

Data integration is a necessary part of many workflows from onboarding customer data to exercising payroll, but for many data sets the process is long and manual. Data is siloed into databases and SaaS applications that each keep the information in different formats, which makes it difficult to move information from one database to another. Lume looks to change that using AI. Lume’s system uses AI and algorithms to automate data mapping by extracting data from its database silos and “normalizi

Nov 12, 2024

A peek inside Physical Intelligence, the startup building Silicon Valley’s buzziest robot brains

A peek inside Physical Intelligence, the startup building Silicon Valley’s buzziest robot brains

If co-founder Lachy Groom has any doubts, he doesn’t show it. He’s working with people who've been working on this problem for decades and who believe the timing is finally right, which is all he needs to know.

Jan 30, 2026

Most Recent

Founders Fund’s outlier bet on humanely killed fish

Founders Fund’s outlier bet on humanely killed fish

Shinkei makes a refrigerator-sized robot called Poseidon to kill fish quickly and humanely.

Jun 20, 2026

He made your free video player run smoothly. Now he’s doing that for robots.

He made your free video player run smoothly. Now he’s doing that for robots.

French serial entrepreneur and open-source legend Jean-Baptiste Kempf has been building Kyber, an infrastructure layer to control remote devices in real time.

Jun 19, 2026

AI inference startup Baseten reportedly raising $1.5B months after its last mega-round

AI inference startup Baseten reportedly raising $1.5B months after its last mega-round

Startup Baseten is reportedly close to finalizing a $1.5 billion round at a $13 billion as the “inference gold rush" marches on.

Jun 18, 2026

The 11 standout startups from YC’s Demo Day, according to VCs

The 11 standout startups from YC’s Demo Day, according to VCs

TechCrunch spoke to investors to find the hottest startups in the Spring 2026 YC batch. Some of them commanded valuations of over $175 million, VCs said.

Jun 18, 2026

Similar Posts

Alloy is bringing data management to the robotics industry

Alloy is bringing data management to the robotics industry

Australia-based Alloy thinks it can help robotics firms with their data problem: The startup is building data infrastructure to help companies process and organize all the data their robots collect.

Sep 23, 2025

Coco Robotics taps UCLA professor to lead new physical AI research lab

Coco Robotics taps UCLA professor to lead new physical AI research lab

Coco Robotics is working toward automating its fleet of delivery robots using its millions of miles of collected data.

Oct 14, 2025

General Catalyst and Khosla Ventures back data mapping startup Lume

General Catalyst and Khosla Ventures back data mapping startup Lume

Data integration is a necessary part of many workflows from onboarding customer data to exercising payroll, but for many data sets the process is long and manual. Data is siloed into databases and SaaS applications that each keep the information in different formats, which makes it difficult to move information from one database to another. Lume looks to change that using AI. Lume’s system uses AI and algorithms to automate data mapping by extracting data from its database silos and “normalizi

Nov 12, 2024

A peek inside Physical Intelligence, the startup building Silicon Valley’s buzziest robot brains

A peek inside Physical Intelligence, the startup building Silicon Valley’s buzziest robot brains

If co-founder Lachy Groom has any doubts, he doesn’t show it. He’s working with people who've been working on this problem for decades and who believe the timing is finally right, which is all he needs to know.

Jan 30, 2026