FLASH: Efficient Visuomotor Policy
via Sparse Sampling

A Fast Legendre-polynomial Action policy via Sparse History-anchored flow

Jiaqi Bai*, Jindou Jia*, Yuxuan Hu, Gen Li, Xiangyu Chen, Tuo An, Kuangji Zuo, Jianfei Yang

MARS Lab, Nanyang Technological University, Singapore

* Equal contribution  ·  Corresponding author

Abstract

Generative models such as diffusion and flow matching have become dominant paradigms for visuomotor policy learning, yet their reliance on iterative denoising incurs high inference latency incompatible with real-time robotic control. We present a Fast Legendre-polynomial Action policy via Sparse History-anchored flow (FLASH), which replaces discrete action-chunk generation with continuous Legendre polynomial trajectory representation. Specifically, by fitting expert demonstrations under sparse temporal sampling, FLASH enables a single inference to cover a significantly extended action horizon. To further accelerate generation, FLASH initiates the flow matching process from history polynomial coefficients rather than uninformative Gaussian noise, shortening the transport distance and enabling accurate single-step inference. Moreover, analytic polynomial differentiation directly provides desired velocity feed-forward signals to the torque controller without numerical approximation. Extensive experiments on five simulated and two real-world manipulation tasks demonstrate that FLASH achieves state-of-the-art success rates (≥ 92% across all tasks), a per-episode inference time of 31.40 ms (up to 175× faster than diffusion policies and 18× faster than prior flow matching policies), up to faster training convergence than ACT, and 5× to 7× reduction in controller tracking error compared to discrete-action baselines.

From Noise-to-Action to Coefficient-to-Coefficient

Robot motions are inherently smooth and low-frequency — a short Legendre polynomial can represent many discrete action points with just a handful of coefficients. FLASH builds on this observation in two ways:

FLASH overview: sparse sampling and coefficient-to-coefficient flow.
Overview of FLASH. (a) Expert trajectories are fitted to polynomial coefficients under sparse temporal sampling; at deployment we densely upsample, feeding both position and velocity to the controller. (b) Conventional generative policies generate discrete action points through multi-step denoising from noise; FLASH transports history coefficients to future coefficients in a single step.

Pipeline

FLASH inference and training pipeline.
Inference (left). Sparse history actions are fitted into Legendre-polynomial coefficients and fused with visual features by a DiT-style Flow Transformer. A single Euler step predicts future coefficients, which are decoded into executable actions. Training (right). Expert actions are converted into target polynomial coefficients via OLS fitting and KKT correction; the model is optimised with flow-matching and consistency losses.

Tasks

We evaluate FLASH on a Franka robot across seven manipulation tasks: five simulated tasks on the Roboverse platform — Close Box, Pick Cube, Stack Cube, Open Drawer, Pick-Place Bowl — and two real-world tasks Place Cube and Insert Cube with millimetre-level insertion tolerance.

Five simulated tasks and two real-world manipulation tasks.
Left: five simulated tasks from the Roboverse platform. Right: two real-world manipulation tasks.

Main Results

Success rates (%) on five simulated tasks at a shared training budget of 10k optimiser steps. NFE denotes the number of generator function sampling steps at inference. Bold = best, underline = second best. Each entry is averaged over 50 independent rollouts.

Method NFE Close Box Pick Cube Stack Cube Open Drawer Pick-Place Bowl
Score-UNet 10044582428
DDPM-UNet 1008468447690
DDIM-UNet 40 8074427692
DDPM-DiT 1006878365076
FM-DiT 10 7092463668
FM-UNet 10 7882307076
ACT 1 709820800
VITA 6 9486866886
A2A-Noise 1 9890928892
FLASH-G 10 7094826288
FLASH (ours) 1 10098969298

With single-step inference (NFE = 1), FLASH achieves ≥ 92% success on every task, beating the homologous FLASH-G (which differs only by starting from Gaussian noise) by 17.6 pp on average — isolating the contribution of the history-anchored flow mechanism — and the strongest single-step baseline A2A-Noise by 4.8 pp on average.

Training Efficiency

Beyond a higher performance ceiling, FLASH exhibits significantly faster convergence. On Pick Cube, FLASH reaches 96% success in just 2,500 steps — about 4× faster than ACT, the strongest baseline. On Stack Cube, FLASH is 30 percentage points above the next-best policy at 6,250 steps. On Pick-Place Bowl, FLASH stabilises at 100% success after 5,000 steps.

Training efficiency curves on three representative tasks.
Success rate versus training steps for three representative tasks. FLASH converges substantially faster than every baseline.

Inference Speed

All policies are run on the same machine (NVIDIA RTX 5090). FLASH finishes a successful episode in 31.4 ms on average: 175× faster than Score-UNet (5,476 ms), 5.1× faster than the homologous FLASH-G (159 ms, sparse sampling but noise-initialised), and 2.2× faster than the strongest baseline A2A-Noise (69 ms).

Inference time, tracking error, and post-hoc speed modulation in simulation.
Left: total episode inference time on Stack Cube across 11 policies; inset zooms the top-4 fastest. Middle: sum of 7-joint absolute tracking errors on Pick Cube; dashed lines mark task completion times. Right: post-hoc speed modulation by sweeping the evaluation stride keval; green/red bands denote the high-SR plateau and failure zone.

Real-World: Insert Cube

The polynomial parameterisation gives the low-level controller an analytic velocity feed-forward signal and a natively high-frequency output. On the real-world Insert Cube task — with millimetre-level tolerance — FLASH attains 100% success, leading seven baselines by an average of 47 pp. Across five real-world rollouts, FLASH's joint tracking MAE is 0.274 ± 0.004°, compared with 0.460 ± 0.028° for FM-DiT.

Real-world Insert Cube task with execution snapshots and success-rate radar chart.
Execution snapshots (grasp → approach → insert) and success-rate radar chart across 8 policies under millimetre-level insertion tolerance.

Ablation Study

Ablation results across FLASH's design choices.
Ablations isolating the contribution of each design choice: sparse temporal sampling, history-anchored flow, cross-horizon C1 continuity constraints, and the polynomial consistency loss.

BibTeX

@article{flash2026,
  title   = {FLASH: Efficient Visuomotor Policy via Sparse Sampling},
  author  = {Bai, Jiaqi and Jia, Jindou and Hu, Yuxuan and Li, Gen and
             Chen, Xiangyu and An, Tuo and Zuo, Kuangji and Yang, Jianfei},
  year    = {2026},
}