All blog posts
Deep diveMay 12, 2026
Per-step survival: a more honest metric for VLA evaluation
OpenBot team
Summary
Aggregate task-success rate hides which subtask a policy actually fails on, and per-frame mAP misses dropouts of multiple seconds. We propose a per-step survival metric for long-horizon manipulation, validated on 14 tasks across four embodiments, and show that it changes the ranking of three popular VLAs.
Full article content is being migrated into the public OpenBot repo. Contact contact@openbot.ai for access.
