Notes-to-Self Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks

Title: Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks
Authors: Sanjay Haresh, Daniel Dijkman, Apratim Bhattacharyya, Roland Memisevic
arXiv: https://arxiv.org/abs/2602.21013

Problem framing

多数 VLA 把当前观测当近似马尔可夫状态，遇到“记住先前物体位置/子目标完成度”的长程操作会明显失效。本文要解决的是：在不重做大规模架构的前提下，给 VLA 增加可解释的时空记忆通道。

Core method

方法是在策略推理回路中加入 language scratchpad：每步把关键信息写入文本记忆，再在后续动作解码时读取。等价于给策略加一个外置可编辑状态，显式维护计划与进度。

Key equations and mechanisms

外置记忆状态更新：

m_{t} = f_{ϕ} (m_{t - 1}, o_{t}, ℓ_{t})

其中 $m_{t}$ 为 scratchpad， $ℓ_{t}$ 为语言指令/子目标。

动作条件化：

a_{t} \sim π_{θ} (a ∣ o_{t}, ℓ_{t}, m_{t})

把历史关键信息以文本状态注入动作头，缓解观测别名问题。

计划跟踪机制：scratchpad 记录“已完成/待完成”子任务，减少长时程漂移。

Experiment reading guide

先看 MemoryBench 与 ClevrSkills memory split，再看真实 pick-and-place。重点比较 stateless、recurrent、scratchpad 三者在 horizon 拉长时的成功率曲线。

Limitations

文本记忆会引入 token 开销和错误累积；如果写入策略本身不稳，可能把噪声长期保留。

Future work

可做结构化 scratchpad（键值槽位），并用不确定性门控决定何时写入/覆盖。

Replication angle

可先在模拟器实现“固定模板 scratchpad + 简单 parser”，验证是否在记忆依赖任务上优于同等参数量的 RNN 基线。

Key Figure: https://arxiv.org/html/2602.21013/x1.png

Graph: Paper Node 2602.21013