Teleop episodes, cleaned and replay-ready.
From raw HDF5 dumps to a curated, versioned dataset your policy will actually generalize on. Dedup, drift detection, viewpoint augmentation, and failure mining — wired into the same loop as Bench and Synth.
- 6
- ingest formats
- 5
- export formats
- Open
- docs + adapters
from openbot import Data
raw = Data.load("teleop_dump.hdf5")
clean = (
raw.deduplicate()
.drop_noise()
.detect_drift(by="operator")
)
augmented = clean.augment_viewpoints(
cameras=["wrist", "overhead", "third_person"]
)
# Bench tells us which subtasks failed
hard_negatives = augmented.mine_failures(
bench_run="run_8c91a4"
)
hard_negatives.export(
format="lerobot",
output_dir="./clean_dataset"
)The six things that decide train or trash.
The bottleneck isn't collecting teleop data — it's deciding which 30% to throw out before your policy learns the wrong thing.
- 01
Auto-dedup & noise filter
Behavioral hash collapses near-duplicate trajectories. Stationary, jittery, and mid-correction segments are dropped before they reach your policy — a noisy dump shrinks to the 30% that actually teaches something.
- 02
Operator-drift detection
Score consistency across demonstrators. When operator A starts grasping differently than B, the bias is flagged and isolated before it infects your training distribution.
- 03
Viewpoint augmentation
Re-render episodes from wrist, overhead, or third-person cameras using neural rendering. One physical capture becomes a viewpoint-invariant training set.
- 04
Action replay & re-recording
Replay actions in sim under new lighting, objects, and dynamics. Or re-record with a corrected end-effector trajectory without going back to the hardware.
- 05
Failure mining from Bench
Bench flags which subtasks fail and why. Data pulls the matching episodes and curates a hard-negative set so the next training cycle targets the actual failure mode.
- 06
Versioned datasets
Every curation step is a commit with a hash. Reproduce the exact dataset that trained the policy you're shipping — three months from now, or on a different machine.
From raw dump to ready-to-train.
Six stages. Each one is a commit you can reproduce, audit, and roll back.
- 01
Ingest
Read RLDS, LeRobot, Open X-Embodiment, HDF5, Parquet, or MP4 + JSON sidecars. No format rewriting required.
- 02
Deduplicate
Behavioral hash collapses near-identical trajectories. Your 10k-episode dump shrinks to the unique behaviors.
- 03
Drift detection
Score demonstrator consistency. Flag operator A vs B bias before it becomes a policy failure mode.
- 04
Viewpoint augmentation
Re-render from wrist, overhead, or third-person. One capture, multiple camera policies.
- 05
Failure mining
Bench reports which subtasks fail. Data pulls matching episodes and builds a targeted hard-negative set.
- 06
Version & export
Every curation step is a commit. Export to LeRobot, RLDS, HDF5, WebDataset, or Parquet — reproducibly.
Every format you already have.
Stop training on noise.
Point Data at your existing teleop dump. Get a curated, versioned dataset back. Trace every commit.
