Shun Zhang

Shun Zhang

Embodied AI + Reasoning @ NVIDIA

Hi! This is Shun Zhang (张舜). I am a senior GenAI engineer on the NVIDIA Cosmos team.

My research interests lie in reinforcement learning (RL) and language models. I am interested in RL-inspired self-evolving agents that plan, acquire reusable skills, and improve through interaction. I am also interested in value alignment, particularly enabling agents to proactively infer and adapt to human users’ goals rather than reactively following instructions.

CAREER TIMELINE

Aug 2025 - Present

Senior Generative AI Engineer

NVIDIA

Santa Clara, CA

Post-training of vision language models.

Jun 2024 - Jan 2025

Founding Member of Technical Staff

Asari AI

San Francisco, CA

Developed an AI agent that plans, verifies, and discovers new skills and knowledge.

Jun 2022 - Jun 2024

Research Scientist

MIT-IBM Watson AI Lab

Research on reinforcement learning and post-training of language models, with a focus on code generation and reinforcement learning from human feedback.

Aug 2020 - Jun 2022

Postdoctoral Researcher

IBM Research

Research on meta-reinforcement learning and AI for scientific discovery.

Sep 2015 - Apr 2020

Ph.D. in Computer Science and Engineering

University of Michigan

Ann Arbor, MI

Research on value alignment and AI safety in reinforcement learning.

Aug 2015

B.S. and M.S. in Computer Science

University of Texas at Austin

Austin, TX

Undergraduate/master research advisors: Prof. Peter Stone and Prof. Dana Ballard.

SELECTED PUBLICATIONS

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

Shun Zhang, Zhenfang Chen, Sunli Chen, Yikang Shen, Zhiqing Sun, and Chuang Gan

arXiv, 2024

We introduce efficient reward model ensemble approaches for reinforcement learning from human feedback (RLHF), achieving better alignment with human values under computational constraints.

Adaptive Online Replanning with Diffusion Models

Siyuan Zhou, Yilun Du, Shun Zhang, Mengdi Xu, Yikang Shen, Wei Xiao, Dit-Yan Yeung, and Chuang Gan

Conference on Neural Information Processing Systems (NeurIPS), 2023

When used for planning, diffusion models generate complete plans at once without incorporating new observations during the execution. Our algorithm determines when it is necessary to replan with new observations, and replans efficiently.

Planning with Large Language Models for Code Generation

Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, and Chuang Gan

International Conference on Learning Representations (ICLR), 2023

Our algorithm combines Monte-Carlo tree search with the Transformer beam search algorithm for code generation. It's more sample-efficient than a well-accepted sampling + filtering baseline.

Prompting Decision Transformer for Few-shot Policy Generalization

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, and Chuang Gan

International Conference on Machine Learning (ICML), 2022

Using trajectory segments of different tasks as prompts for a Decision Transformer to achieve meta-offline reinforcement learning.

Efficiently Finding Approximately-Optimal Queries for Improving Policies and Guaranteeing Safety

Shun Zhang

Ph.D. Dissertation, 2020

Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Processes

Shun Zhang, Edmund H. Durfee, and Satinder Singh

AAAI Conference on Artificial Intelligence (AAAI), 2020

An agent is uncertain about which policies are safe. It either finds a safe policy to accomplish a task or proves that no safe policies exist using a minimum number of queries.

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes

Shun Zhang, Edmund H. Durfee, and Satinder Singh

International Joint Conference on Artificial Intelligence (IJCAI), 2018

A novel formulation and a query selection algorithm for the avoiding negative side effects problem in safe reinforcement learning.

Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes

Shun Zhang, Edmund H. Durfee, and Satinder Singh

International Conference on Automated Planning and Scheduling (ICAPS), 2017

A provably-optimal query selection algorithm to resolve reward uncertainty for better planning in reward-uncertain Markov decision processes.

Autonomous Intersection Management for Semi-Autonomous Vehicles

Tsz-Chiu Au, Shun Zhang, and Peter Stone

Handbook of Transportation, 2015