I am a Ph.D. candidate in the School of Computer Science at Peking University , advised by Prof. Zongqing Lu. My research interests include Reinforcement Learning, Language Modeling, and Embodied Agent. Feel free to contact me if you are interested in discussing or collaborating.

[Email / Google Scholar / Github / CV]

Education

  • School of Computer Science, Peking University.
    • Ph.D. Candidate. (Sep. 2022 — Present)
    • Supervisor: Prof. Zongqing Lu.
  • Department of Computer Science and Technology, Tsinghua University.
    • Master of Science Degree. (Sep. 2019 — Jun. 2022)
    • Supervisor: Prof. Xi Xiao.
  • School of Mathematical Sciences, Nankai University.
    • Bachelor of Science Degree. (Sep. 2015 — Jun. 2019)
    • Advisor: Prof. Jishou Ruan.

Publication

2025

  • From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities. (ICLR’25)
    • Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu
    • Link / PDF / Bib
  • SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking. (ICLR’25)
    • Xingrun Xing, Boyan Gao, David A. Clifton, Zheng Liu, Shitao Xiao, Wanpeng Zhang, Li Du, Zheng Zhang, Guoqi Li, Jiajun Zhang
    • Link / PDF / Bib
  • LLM-Based Explicit Models of Opponents for Multi-Agent Games. (NAACL’25)
    • Xiaopeng Yu, Wanpeng Zhang, Zongqing Lu

2024

  • VideoOrion: Tokenizing Object Dynamics in Videos. (arXiv’24.11)
    • Yicheng Feng, Yijiang Li, Wanpeng Zhang, Sipeng Zheng, Zongqing Lu
    • Link / PDF / Bib
  • Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation. (ICML’24)
    • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.
    • Link / PDF / Bib / Code
  • AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback. (NAACL’24, findings)

2023

  • Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning. (ICML’23)
    • Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. *equal contribution
    • Link / PDF / Bib / Code

2022

  • Model-Based Opponent Modeling. (NeurIPS’22)
    • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.
    • Link / PDF / Bib / Code
  • iGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control. (AAAI’22)
    • Xiaoyan Cao, Yao Yao, Lanqing Li, Wanpeng Zhang, Zhicheng An, Zhong Zhang, Shihui Guo, Li Xiao, Xiaoyu Cao, Dijun Luo.
    • Link / PDF / Bib / Code
  • Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning. (ICASSP’22)
    • Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Xiaotian Gao.
    • Link / PDF / Bib

2021

  • Robust Model-based Reinforcement Learning for Autonomous Greenhouse Control. (ACML’21)
    • Wanpeng Zhang, Xiaoyan Cao, Yao Yao, Zhicheng An, Dijun Luo, Xi Xiao.
    • Link / PDF / Bib
  • Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation. (ICRA’21)
    • Yao Yao, Li Xiao, Zhicheng An, Wanpeng Zhang, Dijun Luo.
    • Link / PDF / Bib / Code
  • A Simulator-based Planning Framework for Optimizing Autonomous Greenhouse Control Strategy. (ICAPS’21)
    • Zhicheng An, Xiaoyan Cao, Yao Yao, Wanpeng Zhang, Lanqing Li, Yue Wang, Shihui Guo, Dijun Luo.
    • Link / PDF / Bib
  • MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning. (arXiv’21.08)
    • Wanpeng Zhang, Xi Xiao, Yao Yao, Mingzhe Chen, Dijun Luo.
    • Link / PDF / Bib

2020

  • Self-Paced Probabilistic Principal Component Analysis for Data with Outliers. (ICASSP’20)
    • Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Guojun Gan, Shutao Xia.
    • Link / PDF / Bib

Patent

  • Multimodal data processing method, device, storage medium, and electronic equipment. (CN119226992A)
    • Zongqing Lu, Wanpeng Zhang
  • Method, device and equipment for determining parameters and storage medium. (CN112527104A)
    • Wanpeng Zhang, Dijun Luo, Xi Xiao
    • Link / PDF

Experience

  • Beijing Academy of Artificial Intelligence (BAAI)
    • Research Intern. (May. 2024 — Present)
    • Topic: Multimodal LLMs / Embodied Agent.
    • Supervisor: Prof. Zongqing Lu.
  • Tencent AI Lab
    • Research Intern. (Jun. 2020 — Jul. 2021)
    • Topic: Reinforcement Learning / AI for Science.
    • Supervisor: Dr. Dijun Luo.
  • Availink
    • Research Intern. (Aug. 2018 — Oct. 2018)
    • Topic: Machine Learning.
    • Supervisor: Dr. Feng-Wen Sun.

Award

  • Award for Scientific Research of Peking University. (Dec. 2024)
  • Presidential Scholarship of Peking University. (Nov. 2024)
  • Rhino-bird Elite Training Program of Tencent AI Lab. (Jul. 2021)
  • Mathematical Contest in Modeling (MCM/ICM), Meritorious Winner (First Prize). (Apr. 2017)
  • China Undergraduate Mathematical Contest in Modeling (CUMCM), Second Prize. (Jan. 2016)

Service

  • Conference Reviewer
    • ICML: 2022, 2023, 2024, 2025
    • NeurIPS: 2022, 2023, 2024
    • ICLR: 2024, 2025
    • AAAI: 2025
    • ICRA: 2025
    • AISTATS: 2025
  • Journal Reviewer
    • TNNLS
    • TIST