Shun Zhang

Hi! This is Shun Zhang (张舜). I am a research scientist at the MIT-IBM Watson AI Lab. My research interests lie in reinforcement learning, large language models, and value alignment.

I received my Ph.D. from the University of Michigan, advised by Prof. Satinder Singh and Prof. Ed Durfee. Prior to that, I received both my B.S. and M.S. in computer science from the University of Texas at Austin, working on research advised by Prof. Peter Stone and Prof. Dana Ballard.

My CV in HTML and PDF.

Email  /  Bio  /  GitHub  /  Google Scholar  /  LinkedIn  /  Twitter

profile photo

Research

Research Overview (best viewed on wider screens)
Reward uncertainty (ZDS '17);
Safety constraint uncertainty
(ZDS '18, ZDS '20)
FM as infrastructure for RL
(XSZ+ '22, XLS+ '23);
Planning in FM inference
(ZCS+ '23);
AI for scientific discovery
(FCZ+ '21)
Reward model ensemble in RLHF (ZCC+ '24)
project image

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble (Short Paper)


Shun Zhang, Zhenfang Chen, Sunli Chen, Yikang Shen, Zhiqing Sun, and Chuang Gan
arXiv, 2024
[paper]

We introduce efficient reward model ensemble approaches for reinforcement learning from human feedback (RLHF), achieving better alignment with human values under computational constraints.

# large language model # value alignment
project image

Adaptive Online Replanning with Diffusion Models


Siyuan Zhou, Yilun Du, Shun Zhang, Mengdi Xu, Yikang Shen, Wei Xiao, Dit-Yan Yeung, and Chuang Gan
Conference on Neural Information Processing Systems (NeurIPS), 2023
[paper] [website]

When used for planning, diffusion models generate complete plans at once without incorporating new observations during the execution. Our algorithm determines when it is necessary to replan with new observations, and replans efficiently.

# diffusion model # planning
project image

Planning with Large Language Models for Code Generation


Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, and Chuang Gan
International Conference on Learning Representations (ICLR), 2023
[paper] [slides] [website] [code]

Our algorithm combines Monte-Carlo tree search wth the Transformer beam search algorithm for code generation. It’s more sample-efficient than a well-accepted sampling + filtering baseline.

# large language model # planning
project image

Hyper-Decision Transformer for Efficient Online Policy Adaptation


Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, and Chuang Gan
International Conference on Learning Representations (ICLR), 2023
[paper]

Using hypernets to efficiently adapt a Decision Transformer to new tasks with a handful of demonstrations.

# large language model # reinforcement learning
project image

Prompting Decision Transformer for Few-shot Policy Generalization


Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, and Chuang Gan
International Conference on Machine Learning (ICML), 2022
[paper] [website] [code]

Using trajectory segments of different tasks as prompts for a Decision Transformer to achieve meta-offline reinforcement learning.

# large language model # reinforcement learning
project image

From Specification to Topology: Automatic Power Converter Design via Reinforcement Learning


Shaoze Fan, Ningyuan Cao, Shun Zhang, Jing Li, Xiaoxiao Guo, and Xin Zhang
International Conference on Computer Aided Design (ICCAD), 2021
[paper]

Using Monte-Carlo tree search to automatically design electric circuits.

# planning # AI for science
project image

Efficiently Finding Approximately-Optimal Queries for Improving Policies and Guaranteeing Safety


Shun Zhang
Ph.D. Dissertation, 2020
[paper] [slides]

# value alignment # planning under uncertainty
project image

Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Processes


Shun Zhang, Edmund H. Durfee, and Satinder Singh
AAAI Conference on Artificial Intelligence (AAAI), 2020
[paper] [poster]

An agent is uncertain about which policies are safe. It either finds a safe policy to accomplish a task or proves that no safe policies exist using a minimum number of queries.

# value alignment # planning under uncertainty
project image

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes


Shun Zhang, Edmund H. Durfee, and Satinder Singh
International Joint Conference on Artificial Intelligence (IJCAI), 2018
[paper] [slides]

A novel formulation and a query selection algorithm for the avoiding negative side effects problem in safe reinforcement learning.

# value alignment # planning under uncertainty
project image

Modeling Sensory-Motor Decisions in Natural Behavior


Ruohan Zhang, Shun Zhang, Matthew H. Tong, Yuchen Cui, Constatin A. Rothkopf, Dana H. Ballard, and Mary M. Hayhoe
PLoS Computational Biology, 2018
[paper] [slides]

# cognitive science # reinforcement learning
project image

Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes


Shun Zhang, Edmund H. Durfee, and Satinder Singh
International Conference on Automated Planning and Scheduling (ICAPS), 2017
[paper] [slides]

A provably-optimal query selection algorithm to resolve reward uncertainty for better planning in reward-uncertain Markov decision processes.

# value alignment # planning under uncertainty
project image

Determining Placements of Influencing Agents in a Flock


Katie Genter, Shun Zhang, and Peter Stone
Autonomous Agents and Multiagent Systems (AAMAS), 2015
[paper] [slides]

# multi-agent system
project image

Autonomous Intersection Management for Semi-Autonomous Vehicles


Tsz-Chiu Au, Shun Zhang, and Peter Stone
Handbook of Transportation, 2015
[paper] [website]

# multi-agent system