Shun Zhang

Hi! This is Shun Zhang (张舜). My research interests lie in reinforcement learning, large language models, and value alignment.

I received my Ph.D. from the University of Michigan, advised by Prof. Satinder Singh and Prof. Ed Durfee. Prior to that, I received both my B.S. and M.S. in computer science from the University of Texas at Austin, working on research advised by Prof. Peter Stone and Prof. Dana Ballard. I have spent time at the MIT-IBM Watson AI Lab and Asari AI.

My CV in HTML and PDF.

Email / GitHub / Google Scholar / LinkedIn / ReThink (Blog)

Research

Research Overview (click to expand) (best viewed on wider screens)

Reward uncertainty (ZDS '17);
Safety constraint uncertainty
(ZDS '18, ZDS '20)

FM as infrastructure for RL
(XSZ+ '22, XLS+ '23);
Planning in FM inference
(ZCS+ '23)

Reward model ensemble in RLHF (ZCC+ '24)

	Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble Shun Zhang, Zhenfang Chen, Sunli Chen, Yikang Shen, Zhiqing Sun, and Chuang Gan arXiv, 2024 [paper] We introduce efficient reward model ensemble approaches for reinforcement learning from human feedback (RLHF), achieving better alignment with human values under computational constraints. # large language model # value alignment
	Adaptive Online Replanning with Diffusion Models Siyuan Zhou, Yilun Du, Shun Zhang, Mengdi Xu, Yikang Shen, Wei Xiao, Dit-Yan Yeung, and Chuang Gan Conference on Neural Information Processing Systems (NeurIPS), 2023 [paper] [website] When used for planning, diffusion models generate complete plans at once without incorporating new observations during the execution. Our algorithm determines when it is necessary to replan with new observations, and replans efficiently. # diffusion model # planning
	Planning with Large Language Models for Code Generation Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, and Chuang Gan International Conference on Learning Representations (ICLR), 2023 [paper] [slides] [website] [code] Our algorithm combines Monte-Carlo tree search with the Transformer beam search algorithm for code generation. It’s more sample-efficient than a well-accepted sampling + filtering baseline. # large language model # planning
	Hyper-Decision Transformer for Efficient Online Policy Adaptation Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, and Chuang Gan International Conference on Learning Representations (ICLR), 2023 [paper] Using hypernets to efficiently adapt a Decision Transformer to new tasks with a handful of demonstrations. # large language model # reinforcement learning
	Prompting Decision Transformer for Few-shot Policy Generalization Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, and Chuang Gan International Conference on Machine Learning (ICML), 2022 [paper] [website] [code] Using trajectory segments of different tasks as prompts for a Decision Transformer to achieve meta-offline reinforcement learning. # large language model # reinforcement learning
	From Specification to Topology: Automatic Power Converter Design via Reinforcement Learning Shaoze Fan, Ningyuan Cao, Shun Zhang, Jing Li, Xiaoxiao Guo, and Xin Zhang International Conference on Computer Aided Design (ICCAD), 2021 [paper] Using Monte-Carlo tree search to automatically design electric circuits. # planning # AI for science
	Efficiently Finding Approximately-Optimal Queries for Improving Policies and Guaranteeing Safety Shun Zhang Ph.D. Dissertation, 2020 [paper] [slides] # value alignment # planning under uncertainty
	Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Processes Shun Zhang, Edmund H. Durfee, and Satinder Singh AAAI Conference on Artificial Intelligence (AAAI), 2020 [paper] [poster] An agent is uncertain about which policies are safe. It either finds a safe policy to accomplish a task or proves that no safe policies exist using a minimum number of queries. # value alignment # planning under uncertainty
	Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes Shun Zhang, Edmund H. Durfee, and Satinder Singh International Joint Conference on Artificial Intelligence (IJCAI), 2018 [paper] [slides] A novel formulation and a query selection algorithm for the avoiding negative side effects problem in safe reinforcement learning. # value alignment # planning under uncertainty
	Modeling Sensory-Motor Decisions in Natural Behavior Ruohan Zhang, Shun Zhang, Matthew H. Tong, Yuchen Cui, Constatin A. Rothkopf, Dana H. Ballard, and Mary M. Hayhoe PLoS Computational Biology, 2018 [paper] [slides] # cognitive science # reinforcement learning
	Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes Shun Zhang, Edmund H. Durfee, and Satinder Singh International Conference on Automated Planning and Scheduling (ICAPS), 2017 [paper] [slides] A provably-optimal query selection algorithm to resolve reward uncertainty for better planning in reward-uncertain Markov decision processes. # value alignment # planning under uncertainty
	Determining Placements of Influencing Agents in a Flock Katie Genter, Shun Zhang, and Peter Stone Autonomous Agents and Multiagent Systems (AAMAS), 2015 [paper] [slides] # multi-agent system
	Autonomous Intersection Management for Semi-Autonomous Vehicles Tsz-Chiu Au, Shun Zhang, and Peter Stone Handbook of Transportation, 2015 [paper] [website] # multi-agent system

Shun Zhang

Research

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

Adaptive Online Replanning with Diffusion Models

Planning with Large Language Models for Code Generation

Hyper-Decision Transformer for Efficient Online Policy Adaptation

Prompting Decision Transformer for Few-shot Policy Generalization

From Specification to Topology: Automatic Power Converter Design via Reinforcement Learning

Efficiently Finding Approximately-Optimal Queries for Improving Policies and Guaranteeing Safety

Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Processes

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes

Modeling Sensory-Motor Decisions in Natural Behavior

Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes

Determining Placements of Influencing Agents in a Flock

Autonomous Intersection Management for Semi-Autonomous Vehicles