Title: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
Authors: Chenyu Yang, Denis Tarasov, Davide Liconti, Hehui Zheng, Robert K. Katzschmann
arXiv: https://arxiv.org/abs/2602.09580
SOFT-FLOW targets real-world dexterous fine-tuning, where interaction budgets are tiny and optimal actions are highly multimodal. Diffusion policies are expressive but hard to regularize conservatively because exact action likelihoods are unavailable; Gaussian policies are tractable but collapse under multimodal chunked control.
Their solution combines a normalizing-flow policy (exact likelihoods over action chunks) with an action-chunked critic (value estimates aligned with how commands are executed). This fixes two mismatches at once: optimization stability via likelihood regularization, and improved long-horizon credit assignment via chunk-level evaluation.
Because NF gives tractable densities, the conservative term is implementable, unlike diffusion counterparts. Real-robot tests (scissor tape cutting, in-hand cube rotation) show more stable and sample-efficient adaptation than standard baselines.
The key design principle worth reusing is structure matching: if policy outputs are chunked and multimodal, critic and optimization constraints should be chunk-aware and likelihood-aware too.
Graph: Paper Node 2602.09580