2603.06001 — Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration
Recall: 通过推理时注意力重标定修复“语言盲”失效,把VLA的指令对齐问题从重训练转为在线可修复机制。
2603.05868 — AnyCamVLA Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models
Recall: 测试时视角重映射实现零样本相机适配,缓解VLA对外参/内参偏移的脆弱性并保持即插即用部署路径。
2603.05993 — Moving Through Clutter: Scaling Data Collection and Benchmarking for 3D Scene-Aware Humanoid Locomotion via Virtual Reality
Recall: 用VR扩展拥挤环境数据与评测协议,提升人形在3D场景感知行走中的分布覆盖与基准可比性。
2603.05982 — HarvestFlex Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild
Recall: 在农业野外场景验证VLA策略适配流程,为“弱结构环境部署”提供任务分解与失败诊断样本。
2026-03-08
2603.05504 — RoboPocket: Improve Robot Policies Instantly with Your Phone
Recall: 把策略改进闭环从“真机高成本DAgger”迁移到手机端交互采样,实现面向策略薄弱状态的即时数据补洞。
2603.05377 — OpenFrontier: General Navigation with Visual-Language Grounded Frontiers
Recall: 以“语义frontier打分+低层到达”重构开放环境导航,减少对重型重建和任务特定微调的依赖。
2603.05291 — Iterative On-Policy Refinement of Hierarchical Diffusion Policies for Language-Conditioned Manipulation
Recall: 用在线迭代反馈修正层级扩散策略的 planner-controller 失配,把 subgoal 可执行性纳入持续优化闭环。
2603.05117 — SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation
Recall: 通过自演化门控状态压缩长时序观测,缓解 diffusion policy 随 horizon 增长的性能衰减。
2603.04848 — Hyperbolic Multiview Pretraining for Robotic Manipulation
Recall: 将多视角自监督预训练迁移到双曲空间,增强结构关系建模与下游操作泛化潜力。
2603.05017 — Direct Contact-Tolerant Motion Planning With Vision Language Models
Recall: 直接用 VLM 做接触容忍规划语义分区,提升拥挤场景中的规划适应性。
2026-03-07
2603.05385 — Accelerating Sampling-Based Control via Learned Linear Koopman Dynamics
Recall: 通过可学习 Koopman 线性提升空间加速采样控制,把“控制质量-在线速度”矛盾转化为可训练的动力学近似问题。
2603.05355 — Omni-Manip: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception
Recall: 以全向 3D 感知打破 humanoid 前向视场限制,提升大工作空间连续操作的可见性与执行稳定性。
2603.05185 — Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation
Recall: 将 critic 放进 VLA 推理闭环做在线风险抑制,缓解长时程操作中误差累积导致的不可恢复失败。
2603.05312 — UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data
Recall: 以大规模合成数据学习双手灵巧抓取通用策略,把高维接触组合问题从“难采样”转向“可系统泛化”。
2603.05147 — Act, Think or Abstain: Complexity-Aware Adaptive Inference for Vision-Language-Action Models
Recall: 通过复杂度感知门控在 act/think/abstain 间动态分配推理预算,兼顾 VLA 低时延执行与风险控制。
2603.05493 — cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots
Recall: 将 B-spline 动力学优化与 GPU 距离场感知统一,加速高DoF机器人的可行且可响应运动生成。
2603.05487 — Observing and Controlling Features in Vision-Language-Action Models
Recall: 把 mechanistic interpretability 引入 VLA,尝试建立内部特征与行为输出的可控因果连接。
2603.04910 — VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory
Recall: 工作记忆+情景记忆双系统用于非马尔可夫任务,缓解长时程操作中的阶段遗忘与分布漂移失效。
2603.03740 — Whole-Body Safe Control of Robotic Systems with Koopman Neural Dynamics
Recall: 以 Koopman lifting 将全身机器人非线性动力学近线性化,在在线可解框架下显式注入安全约束,是“实时控制+安全可证”结合的高价值方向。
2603.03733 — X-Loco: Towards Generalist Humanoid Locomotion Control via Synergetic Policy Distillation
Recall: 通过多教师协同蒸馏缓解人形多技能策略冲突,推动 locomotion 从单技能强走向通用统一控制。
2603.03380 — LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics
Recall: 聚焦端侧 VLA 部署的量化-时延折中,把“多模态智能”推进到嵌入式机器人实时控制场景。
2603.03627 — Touch2Insert: Zero-Shot Peg Insertion by Touching Intersections of Peg and Hole
Recall: 用触觉交线几何线索做零样本插接修正,针对遮挡和微小公差场景具有直接工程价值。
2603.03798 — Learning Surgical Robotic Manipulation with 3D Spatial Priors
Recall: 将 3D 空间先验注入手术操作学习,提高对遮挡与复杂组织交互的几何鲁棒性。
2603.03897 — IROSA: Interactive Robot Skill Adaptation using Natural Language
Recall: 语言反馈与模仿学习闭环融合,降低机器人技能迭代的人机交互成本。
2026-03-05
2603.04363 — ManipulationNet: An Infrastructure for Benchmarking Real-World Robot Manipulation with Physical Skill Challenges and Embodied Multimodal Reasoning
Recall: 以统一物理技能挑战+多模态推理协议重构真实机器人操作评测,核心价值是把“能做demo”转为“可诊断、可对比、可复现”的系统能力度量。
2603.04356 — RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
Recall: 用365个家庭移动操作任务与大规模场景参数化,系统测量通用机器人随任务规模扩展的学习与泛化规律。
2603.04249 — RoboLight: A Dataset with Linearly Composable Illumination for Robotic Manipulation
Recall: 将光照拆成可组合控制变量,提供机器人操作在颜色/方向/强度扰动下的细粒度鲁棒性诊断基准。
2603.03818 — Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
Recall: 预训练VLA在持续学习中的抗遗忘能力显著强于从零训练策略,提示“表征先验”是机器人终身学习稳定性的关键杠杆。
2603.03596 — MEM: Multi-Scale Embodied Memory for Vision Language Action Models
Recall: 多尺度记忆把长期语义状态与短期感知细节解耦融合,显著改善多阶段任务中的阶段追踪与中断恢复。
2603.03960 — Structural Action Transformer for 3D Dexterous Manipulation
Recall: 结构先验注入Transformer注意力,提升高DoF灵巧手跨本体迁移时的3D几何一致性与操作稳定性。
2603.03836 — SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse
Recall: 将双臂控制拆为可复用技能与组合器,直接缓解左右臂技能配对的组合爆炸问题。
2603.04029 — Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Recall: 以世界模型残差触发在线更新,实现部署期遇到分布漂移时的自适应恢复。
2603.03704 — Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
Recall: 用LLM常识先验重加权POMDP信念更新,降低部分可观测任务中的无效探索成本。
2026-03-04
2603.03280 — How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference
Recall: 用人类偏好学习替代硬编码奖励,直接攻克“可完成但质量不可控”的精细接触操作难题。
2603.03279 — ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation
Recall: 统一多模态控制让人形从“跟踪预定义动作”转向“基于感知与任务语义直接生成全身行为”。
2603.03243 — HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations
Recall: 以 robot-free 人类演示 + 跨本体对齐学习,降低全身移动操作数据采集门槛并提升可扩展性。
2603.03195 — Chain of World: World Model Thinking in Latent Motion
Recall: 在 latent motion 中做 world-model 时序推理,减少像素重建冗余并提升长时程决策有效性。
2603.02083 — π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs
Recall: 提出 critic-free 的分步负感知在线微调机制,核心价值是把 flow-based VLA 在线 RL 从“高成本不稳定”推进到“更细粒度可控更新”。
2603.01766 — Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models
Recall: 把动作预测从离散 waypoint 改为连续隐式函数,直接提升轨迹可微性与高频闭环可控性。
2603.02115 — Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
Recall: 用轨迹比较监督替代重度绝对进度标注,显著提升 reward model 在含失败轨迹大规模数据中的可扩展性。
2603.01549 — Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation
Recall: 训练期注入 privileged 4D 动态先验,缓解 VLA “懂语义不懂物理变化”的核心短板。
2602.23901 — ABPolicy Asynchronous B-Spline Flow Policy for Real-Time and Smooth Robotic Manipulation
Recall: 在 B-spline 控制点空间做异步 flow policy,把“轨迹平滑+实时响应”从冲突目标变成可兼得的策略内生属性。
2602.24121 — Planning from Observation and Interaction
Recall: 仅凭观测与少量交互反演奖励并在 world model 内规划,为“无动作标签”机器人学习提供可落地 IRL 路径。
2602.23934 — Learning to Build Autonomous Robotic Assembly of Stable Structures Without Predefined Plans
Recall: 以 successor features 支撑无蓝图建造中的任务迁移,把结构稳定性决策从固定脚本升级为可学习策略。
2602.24104 — Geometry-based pneumatic actuators for soft robotics
Recall: 用几何约束层重塑气动执行器设计空间,在软体机器人硬件侧提升形变可预测性与可制造性。
2602.23923 — Teleoperated Omni-directional Dual Arm Mobile Manipulation Robotic System with Shared Control for Retail Store
Recall: 面向零售长尾场景的共享控制双臂移动系统,强调工程可运营性与人机协同鲁棒性。
2026-03-01
2602.23058 — GeoWorld Geometric World Models
Recall: 用双曲几何潜空间重构能量式世界模型的长时程预测与规划稳定性,把“层级结构表示”直接转化为更稳的多步决策性能。
2602.23283 — Simple Models Real Swimming Digital Twins for Tendon-Driven Underwater Robots
Recall: 以无状态流体近似+极少真实轨迹参数辨识构建可用数字孪生,为软体水下机器人提供低成本且可泛化的 sim2real 控制底座。
2026-02-28
2602.23312 — Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
Recall: 在 leader-follower 人机交互中系统评估边缘 SLM 的角色判别能力,发现微调在低时延下有效,但长上下文 one-shot 会触发容量瓶颈导致性能回落。
2602.23280 — Physics Informed Viscous Value Representations
Recall: 以黏性 PDE 正则稳定 goal-conditioned value 几何,在低覆盖离线数据下提升策略可泛化性与数值稳定性。
2602.22896 — DySL-VLA Efficient Vision Language Action Inference via Dynamic Static Layer Skipping
Recall: 按动作关键度动态跳层,在保持操作成功率的同时显著降低 VLA 推理时延,面向实时部署价值高。
2602.22663 — Rethinking the Practicality of Vision Language Action Model
Recall: 用跨本体 CEBench + 改进基线重构 VLA 实用性评估,强调数据效率与部署可行性而非盲目扩参。
2602.23287 — Interface Aware Trajectory Reconstruction of Limited Demonstrations for Robot Learning
Recall: 将低维接口示教视作受限观测,重建更接近用户真实意图的高维轨迹,提升辅助机器人学习质量。
2602.22862 — GraspLDP Towards Generalizable Grasping Policy via Latent Diffusion
Recall: 通过 latent diffusion 与抓取几何一致性约束,提高抓取策略在新物体与位姿扰动下的精度与泛化。
2026-02-27
2602.23253 — SPARR Simulation-based Policies with Asymmetric Real-world Residuals for Assembly
Recall: 用“仿真主策略 + 真机残差”非对称训练拆解 sim2real 难题,以更低真机样本成本提升接触装配成功率。
2602.23024 — InCoM Intent-Driven Perception and Structured Coordination for Whole-Body Mobile Manipulation
Recall: 通过意图驱动感知与底盘-机械臂结构化协同,缓解全身移动操作中的耦合控制与视角漂移问题。
2602.23206 — Grasp Slide Roll Comparative Analysis of Contact Modes for Tactile-Based Shape Reconstruction
Recall: 系统比较 grasp/slide/roll 的触觉信息效率,给出“交互成本-几何重建质量”可量化对照。
2602.23259 — Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving
Recall: 在世界模型 rollout 上显式加入风险项 MPC,改善 imitation-only 在长尾驾驶场景的泛化与安全边际。
2026-02-26
2602.22088 — Force Policy: Learning Hybrid Force-Position Control Policy under Interaction Frame for Contact-Rich Manipulation
Recall: 在交互坐标系下结构化解耦力控与位控,提升接触丰富操作中的稳定性与可解释性。
2602.22056 — FlowCorrect: Efficient Interactive Correction of Generative Flow Policies for Robotic Manipulation
Recall: 用稀疏人类 nudges 在部署时纠正 near-miss 失败,以极低重训成本提升生成式操作策略成功率。
2602.22010 — World Guidance: World Modeling in Condition Space for Action Generation
Recall: 将未来观测映射到条件空间引导动作生成,在 VLA 中兼顾可预测紧凑表示与细粒度可控性。
2602.21736 — Joint-Aligned Latent Action Towards Scalable VLA Pretraining in the Wild
Recall: 用关节对齐潜动作替代脆弱手部重建标注,把野外视频规模优势真正转化为 VLA 预训练收益。
2602.21633 — Self-Correcting VLA Online Action Refinement via Sparse World Imagination
Recall: 用稀疏触发的世界模型想象做在线动作纠偏,降低长时任务误差累积与执行漂移。
2602.21445 — VLA Knows Its Limits
Recall: 系统揭示 execution horizon 的最优区间,给出“模型误差 vs 反馈滞后”可解释折中。
2602.21811 — DexRepNet++ Learning Dexterous Robotic Manipulation with Geometric and Spatial Hand-Object Representations
Recall: 通过几何/空间手物表示增强灵巧操作泛化,不再只依赖算法层面调参。
2602.21622 — ADM-DP Adaptive Dynamic Modality Diffusion Policy through Vision-Tactile-Graph Fusion for Multi-Agent Manipulation
Recall: 视觉-触觉-图融合扩散策略提升多机械臂协同中的稳定抓取与避碰能力。
2602.21625 — Tacmap Bridging the Tactile Sim-to-Real Gap via Geometry-Consistent Penetration Depth Map
Recall: 几何一致穿透深度图在精度与效率间折中,显著改善触觉 sim2real 可用性。
2026-02-25
2602.21203 — Squint Fast Visual Reinforcement Learning for Sim-to-Real Robotics
Recall: 通过重排视觉离策略训练路径同时优化墙钟吞吐与样本效率,把 sim2real 视觉 RL 从“可行”推进到“可迭代”。
2602.21198 — Learning from Trials and Errors Reflective Test-Time Planning for Embodied LLMs
Recall: 将 in-action 与 on-action 双层反思注入 embodied 规划闭环,使机器人能跨回合积累经验、减少重复犯错。
2602.21172 — NoRD A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
Recall: 证明自动驾驶 VLA 可在无 dense reasoning 标注下保持竞争性能,显著降低数据与标注成本。
2602.21157 — HALO A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning
Recall: 统一文本推理、视觉前瞻与动作生成三链路,缓解长时程任务中的“会想不会做/会做不会想”割裂。
2602.21013 — Notes-to-Self Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks
Recall: 给 VLA 增加可写可读 language scratchpad,把非马尔可夫长程操作中的记忆需求显式化,显著提升 memory-dependent 操作泛化。
2602.20871 — GeCo-SRT Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer
Recall: geometry-aware MoE 与专家引导回放把 sim2real 从“单任务一次性迁移”推进到可持续知识累积。
2602.20715 — IG-RFT An Interaction-Guided RL Framework for VLA Models in Long-Horizon Robotic Manipulation
Recall: 交互状态引导的 advantage 加权 + 混合稠密奖励 + 三阶段后训练,把长程实机任务成功率从 SFT 基线大幅拉升。
2602.20566 — BFA++ Hierarchical Best-Feature-Aware Token Prune for Multi-View Vision Language Action Model
Recall: 视角内去噪 + 视角间去冗余的层级 token 剪枝,在多视角 VLA 上实现推理加速与成功率双提升。
2026-02-24
2602.19313 — TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
Recall: 从 VLM 内部 token 概率构造零样本进度奖励,绕开“直接输出数值进度”偏置,显著提升跨任务 reward 可用性。
2602.19260 — The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption
Recall: 在结构化长时程操作中,神经符号范式在成功率与能耗两端同时领先,给出“效果-能效”并重的 VLA 反例证据。
2602.19308 — WildOS: Open-Vocabulary Object Search in the Wild
Recall: 将开放词汇语义检索与可通行风险规划统一,面向无先验地图的户外长距离对象搜索。
2602.19273 — 3D Shape Control of Extensible Multi-Section Soft Continuum Robots via Visual Servoing
Recall: 仅凭外部视觉实现软体连续体整机形状闭环控制,并在 3D 空间给出稳定收敛与毫米级稳态误差。
2026-02-23
2602.18424 — CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
Recall: 将机器人能力约束显式条件化到导航评测,检验 VLM 是否真正输出“语义正确且可执行”的路径决策。
2602.18386 — Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO
Recall: 用 PPO 在线联合调节 pure-pursuit 的 lookahead 与转向增益,把手工参数表升级为可迁移自适应调参策略。
2602.17601 — Graph Neural Model Predictive Control for High-Dimensional Systems
Recall: 图结构动力学+condensing MPC 将高维软体系统闭环推到 100Hz/1000节点级,体现“结构先验换实时性”的可扩展控制路线。
2602.17574 — Hybrid System Planning using a Mixed-Integer ADMM Heuristic and Hybrid Zonotopes
Recall: 以 hybrid zonotope 表示与结构感知 ADMM 启发式降低混杂规划内存与求解脆弱性,适合嵌入式混合决策。
2602.17659 — When Vision Overrides Language Evaluating and Mitigating Counterfactual Failures in VLAs
Recall: 提出 LIBERO-CF 与推理期 CAG 去偏机制,直接打击 VLA 的视觉捷径导致的语言失配。
2602.17199 — Nonlinear Predictive Control of the Continuum and Hybrid Dynamics of a Suspended Deformable Cable for Aerial Pick and Place
Recall: PDE+POD 的 ROM-NMPC 框架把柔性吊缆空中操作从高保真建模推进到可在线控制。
2602.17537 — IRIS Learning-Driven Task-Specific Cinema Robot Arm for Visuomotor Motion Control
Recall: 低成本 3D 打印机械臂结合 ACT 模仿学习,实现可部署的任务特化视觉运动控制。
2026-02-21
2602.17259 — FRAPPE Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
Recall: 多教师未来表征对齐替代像素重建,显著提升 VLA 世界感知与长时程泛化的数据效率。
2602.16825 — RRT eta Sampling based Motion Planning and Control from STL Specifications using Arithmetic Geometric Mean Robustness
Recall: 以 AGM 鲁棒度平滑 STL 规划目标,缓解 min-max 非光滑瓶颈并增强多约束树搜索可行性。
2602.16863 — SimToolReal An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation
Recall: 通过对象中心统一 RL 目标与程序化工具族训练,实现灵巧工具操作零样本泛化。
2602.16898 — MALLVI a multi agent framework for integrated generalized robotics manipulation
Recall: 多代理闭环反思机制降低开环漂移,提升零样本操作任务稳健性。
2602.17166 — Geometric Inverse Flight Dynamics on SO3 and Application to Tethered Fixed Wing Aircraft
Recall: 提供 SO(3) 上坐标无关逆动力学闭式映射,为飞行轨迹可行性与控制初始化提供理论抓手。
2602.16358 — System Identification under Constraints and Disturbance A Bayesian Estimation Approach
Recall: 约束一致的贝叶斯 SysID + 线性复杂度 Riccati 递推,把可解释辨识直接打通到接触丰富闭环控制。
2602.16371 — Dynamic Modeling and MPC for Locomotion of Tendon-Driven Soft Quadruped
Recall: Cosserat 连续体软腿与刚体躯干模块化耦合,并以 convex MPC 实现软体四足可实时稳定控制。
2602.16187 — SIT-LMPC Safe Information-Theoretic Learning Model Predictive Control for Iterative Tasks
Recall: 用 flow 价值学习增强不确定性刻画,在迭代任务中实现“性能提升与安全约束共满足”。
2602.16511 — VIGOR Visual Goal-In-Context Inference for Unified Humanoid Fall Safety
Recall: 以 goal-in-context latent 蒸馏统一跌倒全阶段安全控制,实机展示人形跨地形零样本恢复。
2602.16462 — Reactive Motion Generation With Particle-Based Perception in Dynamic Environments
Recall: 动态粒子感知与 MPPI 联合建模机器人-障碍动力学,显著提升动态避障反应质量。
2602.16127 — Reactive Slip Control in Multifingered Grasping Hybrid Tactile Sensing and Internal-Force Optimization
Recall: 快速滑移检测 + 抓取内力零空间 QP,验证多指抓取亚50ms级别闭环止滑可行性。
2026-02-19
2602.16712 — One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation
Recall: 用 canonical 形态参数空间+标准化 URDF 对齐跨手型动作语义,把“手型绑定策略”推进到可零样本跨本体迁移。
2602.16444 — RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation
Recall: 以“多样性采样+物理自反射+人类闭环”自动生成高价值真实任务,显著提升 VLA 预训练数据质量与泛化。
2602.16675 — Learning to unfold cloth: Scaling up world models to deformable object manipulation
Recall: 在 DreamerV2 world model 中注入法向几何信息与数据管线改造,提升可变形布料操作的跨材质泛化与 zero-shot 实机表现。
2602.16710 — EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
Recall: 20k+ 小时 egocentric 数据揭示可预测缩放规律,并通过两阶段迁移显著提升高 DoF 灵巧手真实成功率。
2602.16705 — Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation
Recall: HERO 将残差末端跟踪与大视觉模型结合,把开放词汇语义泛化与可执行控制闭环打通。
2602.16356 — Articulated 3D Scene Graphs for Open-World Mobile Manipulation
Recall: 构建语义-运动学一体的 3D scene graph,使机器人能推理“物体如何运动”并支持长时程移动操作。
2026-02-18
2602.15828 — Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation
Recall: 用任务无关 point-track 技能先验替代任务特定奖励工程,在灵巧手场景实现更稳健的 sim2real 与低样本重组。
2602.15543 — Selective Perception for Robot: Task-Aware Attention in Multimodal VLA
Recall: 通过任务感知动态路由替代静态多模态融合,在降低计算量的同时提升抗噪与任务成功率。
2602.15397 — ActionCodec: What Makes for Good Action Tokenizers
Recall: 将 action tokenizer 目标从“重建优先”转向“控制可优化优先”,系统分析离散动作设计对 VLA 学习动力学的影响。
2602.15827 — Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
Recall: 以 motion matching 串联高动态人类技能并接入感知条件,实现人形机器人长时程 parkour 任务的稳定执行。
2602.14974 — DM0: An Embodied-Native Vision-Language-Action Model towards Physical AI
Recall: embodied-native 统一训练(web+driving+embodied)配合 flow action expert 与梯度隔离机制,显著提升 Table30 的 specialist/generalist 表现。
2602.13977 — WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL
Recall: 以关键帧初始化 rollout + world model-policy 共进化控制 imagined 偏差,使 VLA 后训练 RL 在仿真与真机均显著增益。
2602.14193 — Learning Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation
Recall: 用部件感知连续 3D 特征场替代纯2D表征,提升可动件操作中的跨实例功能对齐与策略泛化。
2602.13579 — TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment
Recall: 通过 rectified flow 做跨本体触觉对齐,在无配对数据条件下改善人到机器人接触任务迁移。
2602.14255 — A Latency-Aware Framework for Visuomotor Policy Learning on Industrial Robots
Recall: 把工业机器人中的高时延显式纳入执行调度,在不改策略结构下提升闭环稳定性。
2602.14526 — TWISTED-RL: Hierarchical Skilled Agents for Knot-Tying without Human Demonstrations
Recall: 以拓扑动作条件的分层 RL 替换监督逆模型,提升无示教绳结任务的复杂结型可解性。
2602.12096 — Multi Graph Search for High-Dimensional Robot Motion Planning
Recall: 以“多隐式图并行扩展+可行桥接合并”替代单图搜索,在高维规划里同时压时延与轨迹不稳定性,并给出 complete/次优界保证。
2602.11882 — Where Bits Matter in World Model Planning: A Paired Mixed-Bit Study for Efficient Spatial Reasoning
Recall: 低比特退化不只看总 bit 数,4-bit 过渡区对模块分配高度敏感,保 encoder 精度通常比均匀量化更关键。
2602.12159 — 3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting
Recall: 把 3DGS 持久记忆直接接入 VLM 决策链,缓解“低层感知误差污染高层导航”的零样本导航瓶颈。
2602.12032 — When would Vision-Proprioception Policies Fail in Robotic Manipulation?
Recall: 指出融合策略在动作阶段切换时易出现视觉抑制,根因是训练中 proprio 梯度主导导致视觉学习不足。
2602.12047 — Safety Beyond the Training Data: Robust Out-of-Distribution MPC via Conformalized System Level Synthesis
Recall: 用 conformal 误差覆盖 + SLS 约束收紧把学习控制的 OOD 安全性显式化,是“可证覆盖+可执行MPC”路线的高价值样本。
2602.11832 — JEPA-VLA: Video Predictive Embedding is Needed for VLA Models
Recall: 证明 VLA 视觉瓶颈在“缺预测性先验”,以 JEPA 视频预测嵌入显著改善样本效率与泛化。
2602.11885 — Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies
Recall: 用 bbox 条件降低语义歧义并给出操作任务 scaling law,直接回答数据投入边际收益问题。
2602.12244 — Any House Any Task: Scalable Long-Horizon Planning for Abstract Human Tasks
Recall: 面向大规模家居场景长时程规划,核心在可扩展任务分解与约束一致性校验。
2602.12199 — Sub—Riemannian boundary value problems for Optimal Geometric Locomotion
Recall: 以 sub-Riemannian 测地边值问题统一形变驱动 locomotion 的耗散最优化,数学机制价值高。
2602.12273 — Learning to Control: The iUzawa-Net for Nonsmooth Optimal Control of Linear PDEs
Recall: 把 Uzawa 迭代展开成可学习网络并给出渐近最优性结果,兼顾实时性与可解释控制理论。
2026-02-13
2602.12281 — Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment
Recall: 提出 CoVer 测试时验证扩展律,通过“指令改写×动作候选×验证器”显著缩小 VLA 的 intention-action gap。
2602.12215 — LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion
Recall: 以 EI-30k 统一具身数据并在 latent dynamics 空间联合学习,显著提升接触、灵巧与长时程任务表现。
2602.12063 — VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model
Recall: 用真实数据提升世界模型、再用合成 rollouts 反哺 VLA,形成策略-世界模型迭代共进闭环。
2602.11934 — Robot-DIFT: Distilling Diffusion Features for Geometrically Consistent Visuomotor Control
Recall: 将扩散模型几何先验蒸馏到确定性视觉骨干,在保持实时性的同时提升闭环控制几何一致性。
2602.11929 — General Humanoid Whole-Body Control via Pretraining and Fast Adaptation
Recall: FAST 通过残差快适应与 CoM 感知控制,提升人形全身控制的 OOD 迁移与动态平衡鲁棒性。
2602.11049 — SQ-CBF: Signed Distance Functions for Numerically Stable Superquadric-Based Safety Filtering
Recall: 用 SDF 替代超二次隐式函数做 barrier,显著改善梯度病态与实时安全过滤器可行性。
2602.10983 — Scaling World Model for Hierarchical Manipulation Policies
Recall: 高层世界模型生成视觉子目标,低层 VLA 执行,显著缓解文本目标到可执行动作的落差并提升 OOD 操作成功率。
2602.11075 — RISE: Self-Improving Robot Policy with Compositional World Model
Recall: 通过“动力学预测+进展价值”组合式世界模型在想象空间迭代策略,降低真实 on-policy RL 成本。
2602.10961 — Stability Analysis of Geometric Control for a Canonical Class of Underactuated Aerial Vehicles with Spurious Forces
Recall: 首次对含寄生力耦合的欠驱动飞行平台给出严格 Lyapunov 稳定性证明,填补几何控制理论缺口。
2026-02-12
2602.09123 — Agile asymmetric multi-legged locomotion: contact planning via geometric mechanics and spin model duality
Recall: reframes high-leg-count gait search as symmetry-aware graph optimization, uncovering a high-speed asymmetric hexapod regime.
2602.09368 — Certified Gradient-Based Contact-Rich Manipulation via Smoothing-Error Reachable Tubes
Recall: pairs smoothed differentiable planning with reachable-tube certification to keep gradient benefits while guaranteeing safety on true hybrid dynamics.
2602.09722 — Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization
Recall: shows embodiment alignment is crucial, naïve dataset pooling can hurt, and common regularizers are less reliable than expected in VLA scaling.
2602.10098 — VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
Recall: leakage-free JEPA latent prediction improves VLA robustness by enforcing causal representation learning over future-state supervision.
2602.10109 — ST4VLA: Spatially Guided Training for Vision-Language-Action Models
Recall: preserves spatial grounding from pretraining into policy learning via coupled spatial and action objectives, yielding strong real-robot gains.
2602.09849 — BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation
Recall: interleaves textual reasoning, visual forecasting, and action generation with low-latency residual flow guidance for better long-horizon control.
2602.09580 — Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
Recall: combines NF likelihood-tractable policies with chunk-aligned critics for stable, sample-efficient dexterous real-world adaptation.
2602.10106 — EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration
Recall: co-trains with large egocentric human data plus embodiment alignment to substantially improve humanoid transfer in unseen scenes.