FPC-VLA

A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction

Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision–language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A similarity-guided fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. By activating the supervisor only at keyframes, our approach significantly increases task success rates with minimal impact on execution time. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.

Overview

Motivation

FPC-VLA Model

Motivation

Videos

ALOHA (8x)

Stack orange block on yellow block
Put yellow block and red cylinder in white plate
Put cup on the left of table

Xiaomi Robot (8x)

Put white ball and green ball in blue plate
Stack orange block on green block

Google Robot (in SIMPLER)

Pick up coke can
Move near
Open/close drawer
Open top drawer and place apple

WidowX (in SIMPLER)

Put spoon on towel
Put carrot on plate
Stack green block on yellow block
Put eggplant on yellow basket

Franka (in LIBERO)

Goal
Put the wine bottle on top of the cabinet
Open the top drawer and put the bowl inside
Spatial
Pick up the black bowl next to the ramekin and place it on the plate
Pick up the black bowl on the wooden cabinet and place it on the plate
Object
Pick up the milk and place it in the basket
Pick up the orange juice and place it in the basket
Long
Put both the alphabet soup and the tomato sauce in the basket
Turn on the stove and put the moka pot on it

Experimental Results

WidowX (in SIMPLER)

Motivation

Google Robot (in SIMPLER)

Motivation

Franka (in LIBERO)

Motivation