I am a Ph.D. candidate in the School of Computer Science at Peking University , advised by Prof. Zongqing Lu. My research interests include Reinforcement Learning, Language Modeling, and Embodied Agent. Feel free to contact me if you are interested in discussing or collaborating.

/ / / /

Education

  • School of Computer Science, Peking University.
    • Ph.D. Candidate. (Sep. 2022 — Present)
    • Supervisor: Prof. Zongqing Lu.
  • Department of Computer Science and Technology, Tsinghua University.
    • Master of Science Degree. (Sep. 2019 — Jun. 2022)
    • Supervisor: Prof. Xi Xiao.
  • School of Mathematical Sciences, Nankai University.
    • Bachelor of Science Degree. (Sep. 2015 — Jun. 2019)
    • Advisor: Prof. Jishou Ruan.

Selected Publications

(For the full publications, please see my Google Scholar.)

1. MLLM & LLM

  • From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities. (ICLR’25)
    • Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu.
    • Link / PDF / Bib
  • SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking. (ICLR’25)
    • Xingrun Xing, Boyan Gao, David A. Clifton, Zheng Liu, Shitao Xiao, Wanpeng Zhang, Li Du, Zheng Zhang, Guoqi Li, Jiajun Zhang.
    • Link / PDF / Bib / Code
  • LLM-Based Explicit Models of Opponents for Multi-Agent Games. (NAACL’25)
    • Xiaopeng Yu, Wanpeng Zhang, Zongqing Lu.
    • Link / PDF / Bib
  • VideoOrion: Tokenizing Object Dynamics in Videos. (Preprint)
    • Yicheng Feng, Yijiang Li, Wanpeng Zhang, Hao Luo, Zihao Yue, Sipeng Zheng, Zongqing Lu.
    • Link / PDF / Bib

2. Reinforcement Learning

  • Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation. (ICML’24)
    • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.
    • Link / PDF / Bib / Code
  • AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback. (NAACL’24)
  • Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning. (ICML’23)
    • Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. *equal contribution
    • Link / PDF / Bib / Code
  • Model-Based Opponent Modeling. (NeurIPS’22)
    • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.
    • Link / PDF / Bib / Code
  • Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation. (ICRA’21)
    • Yao Yao, Li Xiao, Zhicheng An, Wanpeng Zhang, Dijun Luo.
    • Link / PDF / Bib / Code

Patents

  • Multimodal data processing method, device, storage medium, and electronic equipment. (CN119226992A)
    • Zongqing Lu, Wanpeng Zhang.
    • Link / PDF
  • Method, device and equipment for determining parameters and storage medium. (CN112527104A)
    • Wanpeng Zhang, Dijun Luo, Xi Xiao.
    • Link / PDF

Experiences

  • BeingBeyond
    • Research Intern. (Mar. 2025 — Present)
    • Embodied Agent / Multimodal LLMs
  • Beijing Academy of Artificial Intelligence (BAAI)
    • Research Intern. (May. 2024 — Mar.2025)
    • Multimodal LLMs / Embodied Agent
  • Tencent AI Lab
    • Research Intern. (Jun. 2020 — Jul. 2021)
    • Reinforcement Learning / AI for Science.

Awards

  • Award for Scientific Research of Peking University. (Dec. 2024)
  • Presidential Scholarship of Peking University. (Nov. 2024)
  • Rhino-bird Elite Training Program of Tencent AI Lab. (Jul. 2021)
  • Mathematical Contest in Modeling (MCM/ICM), Meritorious Winner (First Prize). (Apr. 2017)
  • China Undergraduate Mathematical Contest in Modeling (CUMCM), Second Prize. (Jan. 2016)

Services

  • Conference Reviewer
    • ICML / NeurIPS / ICLR / ICCV / AAAI / ICRA / AISTATS
  • Journal Reviewer
    • TNNLS / TIST
  • Teaching Assistant
    • Deep Reinforcement Learning, Peking University. (Spring, 2025)