| 
 Huazheng Wang
|   | Assistant Professor,School of Electrical Engineering and Computer Science,
 Oregon State University
 Email: huazheng.wang [at] oregonstate.edu
 | 
 About me
I am an Assistant Professor in the School of Electrical Engineering and Computer Science (EECS) at Oregon State University. I was a Postdoctoral Research Associate at the Department of Electrical and Computer Engineering at Princeton University from 2021 to 2022, hosted by Dr. Mengdi Wang. I received my Ph.D. in Computer Science at University of Virginia in 2021, supervised by Dr. Hongning Wang. I received my B.Eng. in Computer Science at University of Science and Technology of China in 2015. 
My research interests include reinforcement learning, information retrieval and machine learning in general. I recently focused on developing provably efficient and trustworthy reinforcement learning and multi-armed bandit algorithms with applications to information retrieval tasks such as recommendation, ranking, LLM agents, and scientific discovery problems in biology and chemistry. 
  I am looking for one self-motivated PhD students with solid math and coding backgrounds starting Fall 2026.  If you are interested, please apply to the CS or AI program and mention my name in the application. If you are an undergraduate or graduate student at OSU and want to join my lab, please directly send me an email with your CV and transcripts. 
 News and Updates
[06/2025] Received EECS Fabulous Teacher Recognition. I appreciate the recognition from the students and committee.
[05/2025] Two papers accepted by ICML 2025: one spotlight paper on failure attribution of multi-agent LLMs and one on principal-agent bandits.
[02/2025] Talk at AAAI 2025 New Faculty Highlight: “Efficient and Robust Reinforcement Learning from Human Feedback”.
[01/2025] One paper on analyzing gradient entanglement of DPO and its variants is accepted by ICLR 2025.
[12/2024] Talk at CS colloquium series, University of Rochester: “Robust Reinforcement Learning from Biased Human Feedback and Corruption: Theory and Algorithms”.
[09/2024] One paper on risk-aware preference-based RL is accepted by NeurIPS 2024.
[08/2024] We received a new NSF award (IIS-2403401) on Neural Bandits. Thank you NSF!
[05/2024] One paper on conversational dueling bandits is accepted by KDD 2024.
[05/2024] One paper on adversarial attack on combinatorial bandits is accepted by ICML 2024.
[04/2024] One paper on fedrated pure exploration is accepted by UAI 2024.
[01/2024] One paper on policy alignment is accepted by ICLR 2024.
[12/2023] Two papers accepted by AAAI 2024: one on tree search bandits for protein optimization and one on stealthy attack against MAB.
[09/2023] One paper on offline RL for learning to rank is accepted by NeurIPS 2023.
[04/2023] One paper on representation learning in POMDP is accepted by ICML 2023. See you in Hawaii.
[01/2023] Our asynchronous kernel bandits paper is accepted by ICLR 2023.
[09/2022] Two papers accepted by NeurIPS 2022: one on distributed kernel bandits and the other on Thompson Sampling for Directed Evolution. 
[09/2022] Joined EECS at Oregon State University as an Assistant Professor.
 Honors and Awards
[06/2025], EECS Fabulous Teacher Recognition.
[02/2025], AAAI 2025 New Faculty Highlights.
[08/2021], ICML 2021 Best Reviewers (Top 10%).
[08/2019], SIGIR 2019 Best Paper Award.
[2018 - 2021], Bloomberg Data Science Ph.D. Fellowship.
 Publications 
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent SystemsShaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, Qingyun Wu. ICML 2025 (Spotlight, top 2.6%). [arXiv] [code]
Provably Efficient Algorithm for Best Scoring Rule Identification in Online Principal-Agent Information AcquisitionZichen Wang, Chuanhao Li, Huazheng Wang. ICML 2025. [arXiv]
A Common Pitfall of Margin-based Language Model Alignment: Gradient EntanglementHui Yuan, Yifan Zeng, Yue Wu, Huazheng Wang, Mengdi Wang, Liu Leqi. ICLR 2025. [arXiv] [code]
RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement LearningYujie Zhao, Jose Aguilar Escamilla, Weyl Lu, Huazheng Wang. NeurIPS 2024. [arXiv] [code]
Adversarial Attacks on Online Learning to Rank with Stochastic Click ModelsZichen Wang, Rishab Balasubramanian, Hui Yuan, Chenyu Song, Mengdi Wang, Huazheng Wang. Transactions on Machine Learning Research (TMLR), 2024. [arXiv] [code]
Conversational Dueling Bandits in Generalized Linear ModelsShuhua Yang, Hui Yuan, Xiaoying Zhang, Mengdi Wang, Hong Zhang, Huazheng Wang. KDD 2024. [arXiv] [code]
Adversarial Attacks on Combinatorial Multi-Armed BanditsRishab Balasubramanian, Jiawei Li, Prasad Tadepalli, Huazheng Wang, Qingyun Wu, Haoyu Zhao (Alphabetic order). ICML 2024. [arXiv] [code]
Pure Exploration in Asynchronous Federated BanditsZichen Wang, Chuanhao Li, Chenyu Song, Lianghui Wang, Quanquan Gu, Huazheng Wang. UAI 2024. [arXiv] [code]
PARL: A Unified Framework for Policy Alignment in Reinforcement LearningSouradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Furong Huang, Mengdi Wang. ICLR 2024. [arXiv]
Tree Search-Based Evolutionary Bandits for Protein Sequence OptimizationJiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang. AAAI 2024. [arXiv]
Stealthy Adversarial Attacks on Stochastic Multi-Armed BanditsZhiwei Wang, Huazheng Wang, Hongning Wang. AAAI 2024. [arXiv]
Unified Off-Policy Learning to Rank: a Reinforcement Learning PerspectiveZeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang. NeurIPS 2023. [arXiv] [code]
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDPJiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang. International Conference on Machine Learning (ICML 2023). [arXiv]
Incentivizing Exploration in Linear Bandits under Information GapHuazheng Wang, Haifeng Xu, Chuanhao Li, Zhiyuan Liu, Hongning Wang. Proceedings of the 17th ACM Conference on Recommender Systems (RecSys 2023). [arXiv]
Learning Kernelized Contextual Bandits in a Distributed and Asynchronous EnvironmentChuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang.  The Eleventh International Conference on Learning Representations (ICLR 2023). [paper]
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence OptimizationHui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). [arXiv]
Communication Efficient Distributed Learning for Kernelized Contextual BanditsChuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang. Advances in Neural Information Processing Systems 35 (NeurIPS 2022). [arXiv]
Dynamic Global Sensitivity for Differentially Private Contextual BanditsHuazheng Wang, David Zhao, Hongning Wang. Proceedings of the 16th ACM Conference on Recommender Systems (RecSys 2022). [arXiv]
When Are Linear Stochastic Bandits Attackable?Huazheng Wang, Haifeng Xu, Hongning Wang. International Conference on Machine Learning (ICML 2022). [arXiv]
PairRank: Online Pairwise Learning to Rank by Divide-and-ConquerYiling Jia, Huazheng Wang, Stephen Guo, Hongning Wang, Proceedings of the Web Conference 2021 (WWW 2021).  Nominated for the Best Paper Award [arXiv] [code]
Global and Local Differential Privacy for Collaborative BanditsHuazheng Wang, Qian Zhao, Qingyun Wu, Shubham Chopra, Abhinav Khaitan, Hongning Wang, Fourteenth ACM Conference on Recommender Systems (RecSys 2020). [pdf]
Unbiased Learning to Rank: Online or Offline?Qingyao Ai, Tao Yang, Huazheng Wang, Jiaxin Mao, ACM Transactions on Information Systems (TOIS). [arXiv] [code]
A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandits ProblemZhiyuan Liu, Huazheng Wang, Bo Waggoner, Youjian(Eugene) Liu, Lijun Chen, Workshop on Real World Experiment Design and Active Learning at ICML 2020. [arXiv]
Incentivized Exploration for Multi-Armed Bandits under Reward DriftZhiyuan Liu*, Huazheng Wang*, Fan Shen, Kai Liu and Lijun Chen, The 34th AAAI Conference on Artifical Intelligence (AAAI 2020). [arXiv]
Adversarial Domain Adaptation for Machine Reading ComprehensionHuazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang, (EMNLP 2019). [arXiv]
Variance Reduction in Gradient Exploration for Online Learning to RankHuazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, Hongning Wang, The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019). Best Paper Award [arXiv] [code]
Factorization Bandits for Online Influence MaximizationQingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, Hongning Wang, The 25th ACM SIGKDD Conference On Knowledge Discovery And Data Mining (KDD 2019). [arXiv] [code]
Dynamic Ensemble of Contextual Bandits to Satisfy Users’ Changing InterestsQingyun Wu, Huazheng Wang, Yanen Li, Hongning Wang, The Web Conference 2019 (WWW 2019). [pdf]  [code]
Efficient Exploration of Gradient Space for Online Learning to RankHuazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook, Hongning Wang, The 41th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018). [arXiv] [code]
Factorization Bandits for Interactive RecommendationHuazheng Wang, Qingyun Wu, Hongning Wang,  The 31st AAAI Conference on Artifical Intelligence (AAAI 2017). [pdf] [Supplementary] [code]
Learning Hidden Features for Contextual BanditsHuazheng Wang, Qingyun Wu, Hongning Wang, The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016). [pdf] [code]
Contextual Bandits in A Collaborative EnvironmentQingyun Wu, Huazheng Wang, Quanquan Gu, Hongning Wang, The 39th  International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016). [pdf] [code]
Solving Verbal Comprehension Problems in IQ Test by Knowledge-Powered Word EmbeddingHuazheng Wang, Fei Tian, Bin Gao, Chengjieren Zhu, Jiang Bian, Tie-Yan Liu, Conference on Empirical Methods in Natural Language Processing, 2016 (EMNLP-16). [arXiv] [data]
 Preprints
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N SamplingJiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang. [arXiv]
LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based RankingYifan Zeng, Ojas Tendolkar, Raymond Baartmans, Qingyun Wu, Lizhong Chen, Huazheng Wang. [arXiv] [code]
AutoDefense: Multi-Agent LLM Defense against Jailbreak AttacksYifan Zeng, Yiran Wu, Xiao Zhang, Huazheng Wang, Qingyun Wu. [arXiv] [code]
Embodied LLM Agents Learn to Cooperate in Organized TeamsXudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang. [arXiv] [code]
FCOM: A Federated Collaborative Online Monitoring Framework via Representation LearningTanapol Kosolwattana, Huazheng Wang, Raed Al Kontar, Ying Lin. [arXiv]
Multi-Agent JoinVahid Ghadakchi, Mian Xie, Arash Termehchy, Bakhtiyar Doskenov, Bharghav Srikhakollu, Summit Haque, Huazheng Wang. [arXiv]
Online Modeling and Monitoring of Dependent Processes under Resource ConstraintsTanapol Kosolwattana, Huazheng Wang, Ying Lin. [arXiv]
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit ProblemsXiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang. [arXiv]
Machine Learning for Synthetic Data Generation: A Review
Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Tianfan Fu, Wenqi Wei. [arXiv]
Provably Efficient Reinforcement Learning for Online Adaptive Influence MaximizationKaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang. [arXiv]
 Tutorials
Service
 |