I am a Ph.D. candidate in the School of Computer Science at Peking University , advised by Prof. Zongqing Lu. My research interests include Reinforcement Learning, Generative Modeling, Multimodal LLMs, and Embodied Agent. Feel free to contact me if you are interested in discussing or collaborating.

[Email / Google Scholar / Github / CV]

Education

  • School of Computer Science, Peking University.
    • Ph.D. Candidate. (Sep. 2022 — Present)
    • Supervisor: Prof. Zongqing Lu.
  • Department of Computer Science and Technology, Tsinghua University.
    • M.Sc. Degree. (Sep. 2019 — Jun. 2022)
    • Supervisor: Prof. Xi Xiao.
  • School of Mathematical Sciences, Nankai University.
    • B.Sc. Degree. (Sep. 2015 — Jun. 2019)
    • Advisor: Prof. Jishou Ruan.

Publication

  • Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation. (ICML’24)
    • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.
    • Link / PDF / Bib / Code
  • AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback. (NAACL’24)
  • Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning. (ICML’23)
    • Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. *equal contribution
    • Link / PDF / Bib / Code / Talk
  • Model-Based Opponent Modeling. (NeurIPS’22)
    • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.
    • Link / PDF / Bib / Code / Talk
  • iGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control. (AAAI’22)
    • Xiaoyan Cao, Yao Yao, Lanqing Li, Wanpeng Zhang, Zhicheng An, Zhong Zhang, Shihui Guo, Li Xiao, Xiaoyu Cao, Dijun Luo.
    • Link / PDF / Bib / Code
  • Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning. (ICASSP’22)
    • Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Xiaotian Gao.
    • Link / PDF / Bib
  • Robust Model-based Reinforcement Learning for Autonomous Greenhouse Control. (ACML’21)
    • Wanpeng Zhang, Xiaoyan Cao, Yao Yao, Zhicheng An, Dijun Luo, Xi Xiao.
    • Link / PDF / Bib / Talk
  • Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation. (ICRA’21)
    • Yao Yao, Li Xiao, Zhicheng An, Wanpeng Zhang, Dijun Luo.
    • Link / PDF / Bib / Code
  • A Simulator-based Planning Framework for Optimizing Autonomous Greenhouse Control Strategy. (ICAPS’21)
    • Zhicheng An, Xiaoyan Cao, Yao Yao, Wanpeng Zhang, Lanqing Li, Yue Wang, Shihui Guo, Dijun Luo.
    • Link / PDF / Bib
  • MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning. (arXiv’21.08)
    • Wanpeng Zhang, Xi Xiao, Yao Yao, Mingzhe Chen, Dijun Luo.
    • Link / PDF / Bib
  • Self-Paced Probabilistic Principal Component Analysis for Data with Outliers. (ICASSP’20)
    • Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Guojun Gan, Shutao Xia.
    • Link / PDF / Bib

Patent

  • Method, device and equipment for determining parameters and storage medium. (CN112527104A)
    • Wanpeng Zhang, Dijun Luo, Xi Xiao
    • Link / PDF

Experience

  • Beijing Academy of Artificial Intelligence (BAAI)
    • Research Intern. (May. 2024 — Present)
    • Topic: Multimodal LLMs / Embodied Agent.
    • Supervisor: Prof. Zongqing Lu.
  • Tencent AI Lab
    • Research Intern. (Jun. 2020 — Jul. 2021)
    • Topic: Reinforcement Learning / AI for Science.
    • Supervisor: Dr. Dijun Luo.
  • Availink
    • Research Intern. (Aug. 2018 — Oct. 2018)
    • Topic: Machine Learning.
    • Supervisor: Dr. Feng-Wen Sun.

Award

  • Presidential Scholarship of Peking University. (May. 2024)
  • Rhino-bird Elite Training Program of Tencent AI Lab. (Jul. 2021)
  • Mathematical Contest in Modeling (MCM/ICM), Meritorious Winner (First Prize). (Apr. 2017)
  • China Undergraduate Mathematical Contest in Modeling (CUMCM), Second Prize. (Jan. 2016)

Service

  • Conference Reviewer
    • ICML: 2022, 2023, 2024
    • NeurIPS: 2022, 2023, 2024
    • ICLR: 2024