OpenBot
All blog posts
Deep diveMay 12, 2026

Per-step survival: a more honest metric for VLA evaluation

OpenBot team

Summary

Aggregate task-success rate hides which subtask a policy actually fails on, and per-frame mAP misses dropouts of multiple seconds. We propose a per-step survival metric for long-horizon manipulation, validated on 14 tasks across four embodiments, and show that it changes the ranking of three popular VLAs.

Full article content is being migrated into the public OpenBot repo. Contact contact@openbot.ai for access.