OpenBot
← All products
OpenBot Data

Teleop episodes, cleaned and replay-ready.

From raw HDF5 dumps to a curated, versioned dataset your policy will actually generalize on. Dedup, drift detection, viewpoint augmentation, and failure mining — wired into the same loop as Bench and Synth.

6
ingest formats
5
export formats
Open
docs + adapters
curate.py
from openbot import Data

raw = Data.load("teleop_dump.hdf5")

clean = (
    raw.deduplicate()
    .drop_noise()
    .detect_drift(by="operator")
)

augmented = clean.augment_viewpoints(
    cameras=["wrist", "overhead", "third_person"]
)

# Bench tells us which subtasks failed
hard_negatives = augmented.mine_failures(
    bench_run="run_8c91a4"
)

hard_negatives.export(
    format="lerobot",
    output_dir="./clean_dataset"
)
Capabilities

The six things that decide train or trash.

The bottleneck isn't collecting teleop data — it's deciding which 30% to throw out before your policy learns the wrong thing.

  1. 01

    Auto-dedup & noise filter

    Behavioral hash collapses near-duplicate trajectories. Stationary, jittery, and mid-correction segments are dropped before they reach your policy — a noisy dump shrinks to the 30% that actually teaches something.

  2. 02

    Operator-drift detection

    Score consistency across demonstrators. When operator A starts grasping differently than B, the bias is flagged and isolated before it infects your training distribution.

  3. 03

    Viewpoint augmentation

    Re-render episodes from wrist, overhead, or third-person cameras using neural rendering. One physical capture becomes a viewpoint-invariant training set.

  4. 04

    Action replay & re-recording

    Replay actions in sim under new lighting, objects, and dynamics. Or re-record with a corrected end-effector trajectory without going back to the hardware.

  5. 05

    Failure mining from Bench

    Bench flags which subtasks fail and why. Data pulls the matching episodes and curates a hard-negative set so the next training cycle targets the actual failure mode.

  6. 06

    Versioned datasets

    Every curation step is a commit with a hash. Reproduce the exact dataset that trained the policy you're shipping — three months from now, or on a different machine.

Pipeline

From raw dump to ready-to-train.

Six stages. Each one is a commit you can reproduce, audit, and roll back.

  1. 01

    Ingest

    Read RLDS, LeRobot, Open X-Embodiment, HDF5, Parquet, or MP4 + JSON sidecars. No format rewriting required.

  2. 02

    Deduplicate

    Behavioral hash collapses near-identical trajectories. Your 10k-episode dump shrinks to the unique behaviors.

  3. 03

    Drift detection

    Score demonstrator consistency. Flag operator A vs B bias before it becomes a policy failure mode.

  4. 04

    Viewpoint augmentation

    Re-render from wrist, overhead, or third-person. One capture, multiple camera policies.

  5. 05

    Failure mining

    Bench reports which subtasks fail. Data pulls matching episodes and builds a targeted hard-negative set.

  6. 06

    Version & export

    Every curation step is a commit. Export to LeRobot, RLDS, HDF5, WebDataset, or Parquet — reproducibly.

Compatibility

Every format you already have.

Ingest formats
RLDSLeRobotOpen X-EmbodimentHDF5ParquetMP4 + JSON
Export formats
LeRobotRLDSHDF5WebDatasetParquet
Teleop systems
ALOHA / Mobile ALOHAGelloVR / Quest 3SpaceMouseCustom HDF5 streams

Stop training on noise.

Point Data at your existing teleop dump. Get a curated, versioned dataset back. Trace every commit.