HomeCompaniesHuman Archive
Human Archive

Multimodal data provider for robotics and world modeling

We’re archiving the physical world for embodied intelligence by collecting and labeling aligned multimodal data. To build dexterous and perceptive robots that generalize robustly, we need massive amounts of real-world data across multiple modalities and environments. We have thought deeply about the fine line between biomimicry and its application to humanoid systems. Based on this research, we design and deploy custom hardware across residential and manufacturing settings. We then post-process the resulting data through internal QA, anonymization, and annotation pipelines to deliver diverse, high-fidelity datasets at scale to frontier labs developing robotics foundation models and general-purpose robotics companies. We believe we are at a historic inflection point, with a unique opportunity to leave a dent on humanity and reshape physical labor markets forever. That's why our team dropped out of Stanford and Berkeley and moved to Asia to collect the world’s largest annotated multimodal dataset.
Active Founders
Rushil Agarwal
Rushil Agarwal
Founder
building multimodal real-world datasets for robotics | prev. UC Berkeley MET (IEOR + Business)
Samay Maini
Samay Maini
Founder
Creating multimodal real-world datasets for robotics
Raj Patel
Raj Patel
Founder
Archiving the structure of human interaction in the physical world. Berkeley dropout and previous farmer (sold mangoes & planted trees)
Shloke Patel
Shloke Patel
Founder
building in robotics
Company Launches
Human Archive: The World's Largest multimodal robotics dataset
See original launch post

TL;DR: We’re capturing the structure of how humans interact with the physical world to model sensorimotor intelligence at scale
Launch Video: https://www.youtube.com/watch?v=pFHr3GlTpck

The Problem:

Modern AI is largely a knowledge transfer problem. Over the past decade, training on trillions of tokens of internet data enabled breakthroughs in LLMs, diffusion models, and vision-language models. But human intelligence is predominantly embodied, and the internet does not capture it. Every day we manipulate objects, apply force, and subconsciously adapt in noisy real-world environments. Yet there is no dataset that captures the structure of human interaction with the physical world at scale. As a result, progress in **embodied spatial intelligence is bottlenecked by data.**

What we built:

Over the past two months, we built infrastructure for high-quality data collection at scale, including:

  • Custom hardware rigs
  • Internal models for policy benchmarking
  • QA pipelines, alignment, and annotation software
  • 25-person operations team
  • Dedicated servers at our AWS data center for terabyte-scale offloading

Now that the system is live, we can collect up to 8,000 hours of data per day and we’ve signed national level partnerships to scale our contributor network to 50,000+ people. We’ve already shipped datasets to frontier research teams, and have built the largest multimodal dataset of it’s kind.

Our datasets:

We are currently collecting data across homes, restaurants, hotels, retail, transportation, construction, horticulture, and industrial environments across two datasets.

HA-Multi is a fully aligned multimodal dataset with vision, stereo depth (IR dot projection), tactile gloves, body IMUs, and wrist cameras. For customers, we provide structured outputs and visualizations including 3D MANO hand reconstructions, 2D tactile force maps, depth maps per timestamp, and human pose reconstructions.

HA-Ego is a mono RGB vision and wrist cameras dataset.

We provide annotations and metadata including environment and scene descriptions, high-level task descriptions, task-aligned atomic action labels, hand tracking, object segmentation, SLAM (extrinsic and intrinsics), and 3D pose reconstruction.

Team:

Hi, we’re Shloke, Samay, Rushil, and Raj. We’re engineers from Stanford and Berkeley who’ve spent our entire lives building across operations, hardware, and robotics. We’ve known each other for over 20 years and joined forces to dedicate the rest of our lives to building the Common Crawl for human sensorimotor intelligence.

Our Ask

We'd love to talk with you if:

  • You’re building in robotics, world models, or evaluations
  • You work at or with frontier AI labs interested real world
  • You know cleaning companies, hotel chains, restaurants, factories, or other businesses interested in scaling efficiency and profits with AI

👉 Contact us at raj@humanarchive.ai

💼 Follow us on LinkedIn: https://www.linkedin.com/company/human-archive/

📱 Follow us on X: https://x.com/babugi28

Jobs at Human Archive
San Francisco, CA, US
$80K - $110K
0.10% - 0.30%
Any (new grads ok)
IN
₹2.2M - ₹3.5M INR
3+ years
IN
₹1.25M - ₹2.49M INR
Any (new grads ok)
IN
₹1.5M - ₹3M INR
Any (new grads ok)
Human Archive
Founded:2026
Batch:Winter 2026
Team Size:4
Status:
Active
Location:San Francisco
Primary Partner:Jared Friedman