Research Overview (best viewed on wider screens)
|
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble (Short Paper)
Shun Zhang,
Zhenfang Chen,
Sunli Chen,
Yikang Shen,
Zhiqing Sun,
and Chuang Gan
arXiv, 2024
[paper]
We introduce efficient reward model ensemble approaches for reinforcement learning from human feedback (RLHF),
achieving better alignment with human values under computational constraints.
# large language model
# value alignment
|
|
Adaptive Online Replanning with Diffusion Models
Siyuan Zhou,
Yilun Du,
Shun Zhang,
Mengdi Xu,
Yikang Shen,
Wei Xiao,
Dit-Yan Yeung,
and Chuang Gan
Conference on Neural Information Processing Systems (NeurIPS), 2023
[paper]
[website]
When used for planning, diffusion models generate complete plans at once without incorporating new observations during the execution.
Our algorithm determines when it is necessary to replan with new observations, and replans efficiently.
# diffusion model
# planning
|
|
Planning with Large Language Models for Code Generation
Shun Zhang,
Zhenfang Chen,
Yikang Shen,
Mingyu Ding,
Joshua B. Tenenbaum,
and Chuang Gan
International Conference on Learning Representations (ICLR), 2023
[paper]
[slides]
[website]
[code]
Our algorithm combines Monte-Carlo tree search wth the Transformer beam search algorithm for code generation. It’s more sample-efficient than a well-accepted sampling + filtering baseline.
# large language model
# planning
|
|
Hyper-Decision Transformer for Efficient Online Policy Adaptation
Mengdi Xu,
Yuchen Lu,
Yikang Shen,
Shun Zhang,
Ding Zhao,
and Chuang Gan
International Conference on Learning Representations (ICLR), 2023
[paper]
Using hypernets to efficiently adapt a Decision Transformer to new tasks with a handful of demonstrations.
# large language model
# reinforcement learning
|
|
Prompting Decision Transformer for Few-shot Policy Generalization
Mengdi Xu,
Yikang Shen,
Shun Zhang,
Yuchen Lu,
Ding Zhao,
Joshua B. Tenenbaum,
and Chuang Gan
International Conference on Machine Learning (ICML), 2022
[paper]
[website]
[code]
Using trajectory segments of different tasks as prompts for a Decision Transformer to achieve meta-offline reinforcement learning.
# large language model
# reinforcement learning
|
|
From Specification to Topology: Automatic Power Converter Design via Reinforcement Learning
Shaoze Fan,
Ningyuan Cao,
Shun Zhang,
Jing Li,
Xiaoxiao Guo,
and Xin Zhang
International Conference on Computer Aided Design (ICCAD), 2021
[paper]
Using Monte-Carlo tree search to automatically design electric circuits.
# planning
# AI for science
|
|
Efficiently Finding Approximately-Optimal Queries for Improving Policies and Guaranteeing Safety
Shun Zhang
Ph.D. Dissertation, 2020
[paper]
[slides]
# value alignment
# planning under uncertainty
|
|
Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Processes
Shun Zhang,
Edmund H. Durfee,
and Satinder Singh
AAAI Conference on Artificial Intelligence (AAAI), 2020
[paper]
[poster]
An agent is uncertain about which policies are safe. It either finds a safe policy to accomplish a task or proves that no safe policies exist using a minimum number of queries.
# value alignment
# planning under uncertainty
|
|
Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes
Shun Zhang,
Edmund H. Durfee,
and Satinder Singh
International Joint Conference on Artificial Intelligence (IJCAI), 2018
[paper]
[slides]
A novel formulation and a query selection algorithm for the avoiding negative side effects problem in safe reinforcement learning.
# value alignment
# planning under uncertainty
|
|
Modeling Sensory-Motor Decisions in Natural Behavior
Ruohan Zhang,
Shun Zhang,
Matthew H. Tong,
Yuchen Cui,
Constatin A. Rothkopf,
Dana H. Ballard,
and Mary M. Hayhoe
PLoS Computational Biology, 2018
[paper]
[slides]
# cognitive science
# reinforcement learning
|
|
Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes
Shun Zhang,
Edmund H. Durfee,
and Satinder Singh
International Conference on Automated Planning and Scheduling (ICAPS), 2017
[paper]
[slides]
A provably-optimal query selection algorithm to resolve reward uncertainty for better planning in reward-uncertain Markov decision processes.
# value alignment
# planning under uncertainty
|
|
Determining Placements of Influencing Agents in a Flock
Katie Genter,
Shun Zhang,
and Peter Stone
Autonomous Agents and Multiagent Systems (AAMAS), 2015
[paper]
[slides]
# multi-agent system
|
|
Autonomous Intersection Management for Semi-Autonomous Vehicles
Tsz-Chiu Au,
Shun Zhang,
and Peter Stone
Handbook of Transportation, 2015
[paper]
[website]
# multi-agent system
|
|