
I am a Ph.D. candidate in the School of Computer Science at Peking University , advised by Prof. Zongqing Lu. My research interests include Reinforcement Learning, Language Modeling, and Embodied Agent. Feel free to contact me if you are interested in discussing or collaborating.
[Email / Google Scholar / Github / CV]
Education
- School of Computer Science, Peking University.
- Ph.D. Candidate. (Sep. 2022 — Present)
- Supervisor: Prof. Zongqing Lu.
- Department of Computer Science and Technology, Tsinghua University.
- Master of Science Degree. (Sep. 2019 — Jun. 2022)
- Supervisor: Prof. Xi Xiao.
- School of Mathematical Sciences, Nankai University.
- Bachelor of Science Degree. (Sep. 2015 — Jun. 2019)
- Advisor: Prof. Jishou Ruan.
Publication
2025
- From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities. (ICLR’25)
- SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking. (ICLR’25)
- LLM-Based Explicit Models of Opponents for Multi-Agent Games. (NAACL’25)
- Xiaopeng Yu, Wanpeng Zhang, Zongqing Lu
2024
- VideoOrion: Tokenizing Object Dynamics in Videos. (arXiv’24.11)
- Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation. (ICML’24)
- AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback. (NAACL’24, findings)
2023
- Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning. (ICML’23)
2022
- Model-Based Opponent Modeling. (NeurIPS’22)
- iGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control. (AAAI’22)
- Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning. (ICASSP’22)
2021
- Robust Model-based Reinforcement Learning for Autonomous Greenhouse Control. (ACML’21)
- Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation. (ICRA’21)
- A Simulator-based Planning Framework for Optimizing Autonomous Greenhouse Control Strategy. (ICAPS’21)
- MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning. (arXiv’21.08)
2020
- Self-Paced Probabilistic Principal Component Analysis for Data with Outliers. (ICASSP’20)
Patent
- Multimodal data processing method, device, storage medium, and electronic equipment. (CN119226992A)
- Zongqing Lu, Wanpeng Zhang
- Method, device and equipment for determining parameters and storage medium. (CN112527104A)
Experience
- Beijing Academy of Artificial Intelligence (BAAI)
- Research Intern. (May. 2024 — Present)
- Topic: Multimodal LLMs / Embodied Agent.
- Supervisor: Prof. Zongqing Lu.
- Tencent AI Lab
- Research Intern. (Jun. 2020 — Jul. 2021)
- Topic: Reinforcement Learning / AI for Science.
- Supervisor: Dr. Dijun Luo.
- Availink
- Research Intern. (Aug. 2018 — Oct. 2018)
- Topic: Machine Learning.
- Supervisor: Dr. Feng-Wen Sun.
Award
- Award for Scientific Research of Peking University. (Dec. 2024)
- Presidential Scholarship of Peking University. (Nov. 2024)
- Rhino-bird Elite Training Program of Tencent AI Lab. (Jul. 2021)
- Mathematical Contest in Modeling (MCM/ICM), Meritorious Winner (First Prize). (Apr. 2017)
- China Undergraduate Mathematical Contest in Modeling (CUMCM), Second Prize. (Jan. 2016)
Service
- Conference Reviewer
- ICML: 2022, 2023, 2024, 2025
- NeurIPS: 2022, 2023, 2024
- ICLR: 2024, 2025
- AAAI: 2025
- ICRA: 2025
- AISTATS: 2025
- Journal Reviewer
- TNNLS
- TIST