Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows

Title: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
Authors: Chenyu Yang, Denis Tarasov, Davide Liconti, Hehui Zheng, Robert K. Katzschmann
arXiv: https://arxiv.org/abs/2602.09580

SOFT-FLOW targets real-world dexterous fine-tuning, where interaction budgets are tiny and optimal actions are highly multimodal. Diffusion policies are expressive but hard to regularize conservatively because exact action likelihoods are unavailable; Gaussian policies are tractable but collapse under multimodal chunked control.

Their solution combines a normalizing-flow policy (exact likelihoods over action chunks) with an action-chunked critic (value estimates aligned with how commands are executed). This fixes two mismatches at once: optimization stability via likelihood regularization, and improved long-horizon credit assignment via chunk-level evaluation.

θ max E_{(s, a) \sim D} [Q_{ϕ} (s, a_{t : t + K - 1})] - α KL (π_{θ} ∥ π_{ref})

Because NF gives tractable densities, the conservative term is implementable, unlike diffusion counterparts. Real-robot tests (scissor tape cutting, in-hand cube rotation) show more stable and sample-efficient adaptation than standard baselines.

The key design principle worth reusing is structure matching: if policy outputs are chunked and multimodal, critic and optimization constraints should be chunk-aware and likelihood-aware too.

Graph: Paper Node 2602.09580