Ning Ding

New Research on Exploration-based Method for Advanced Reasoning Released
January 2025
We recently released Implicit Process Reward Modeling (ImplicitPRM) and Process Reinforcement through Implicit Rewards (PRIME) , scalable solutions for using (nearly) pure reinforcement learning to advance the reasoning capability of language models. We basically focus on two problems: (1) how to efficiently generate reliable dense rewards and (2) how to effectively use them.
Personal Update
January 2025
I will be starting as an assistant professor at the EE department of Tsinghua University in 2025.

Selected Papers (Full List on Google Scholar)

Process Reinforcement through Implicit Rewards

Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, and Ning Ding

Preprint

ABSTRACT PDF CODE Bib Huggingface

Dense process rewards have proven a more effective alternative to the sparse outcome-level rewards in the inference-time scaling of large language models (LLMs), particularly in tasks requiring complex multi-step reasoning. While dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards, such as training efficiency and credit assignment, this potential remains largely unrealized. This can be primarily attributed to the challenges of training process reward models (PRMs) online, where collecting high-quality process labels is prohibitively expensive, making them particularly vulnerable to reward hacking. To address these challenges, we propose PRIME (Process Reinforcement through IMplicit rEwards), which enables online PRM updates using only policy rollouts and outcome labels through implict process rewards. PRIME combines well with various advantage functions and forgoes the dedicated reward model training phrase that existing approaches require, substantially reducing the development overhead. We demonstrate PRIME’s effectiveness on competitional math and coding. Starting from Qwen2.5-Math-7B-Base, PRIME achieves a 15.1% average improvement across several key reasoning benchmarks over the SFT model. Notably, our resulting model, Eurus-2-7B-PRIME, surpasses Qwen2.5-Math-7B-Instruct on seven reasoning benchmarks with 10% of its training data.
@article{preprint:prime, title = {Process Reinforcement through Implicit Rewards}, bibtex_show = {true}, preview = {prime.gif}, selected = {true}, author = {Cui, Ganqu and Yuan, Lifan and Wang, Zefan and Wang, Hanbin and Li, Wendi and He, Bingxiang and Fan, Yuchen and Yu, Tianyu and Xu, Qixin and Chen, Weize and Yuan, Jiarui and Chen, Huayu and Zhang, Kaiyan and Lv, Xingtai and Wang, Shuo and Yao, Yuan and Han, Xu and Peng, Hao and Cheng, Yu and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen and Ding, Ning}, journal = {Preprint}, pdf = {https://arxiv.org/abs/2502.01456}, series = {\newlin arXiv}, code = {https://github.com/PRIME-RL/PRIME}, huggingface = {https://huggingface.co/PRIME-RL}, year = {Preprint} }
Free Process Rewards without Process Labels

Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, and Hao Peng

Preprint

ABSTRACT PDF CODE Bib Huggingface

Different from its counterpart outcome reward models (ORMs), which evaluate the entire responses, a process reward model (PRM) scores a reasoning trajectory step by step, providing denser and more fine grained rewards. However, training a PRM requires labels annotated at every intermediate step, presenting significant challenges for both manual and automatic data collection. This paper aims to address this challenge. Both theoretically and empirically, we show that an implicit PRM can be obtained at no additional cost, by simply training an ORM on the cheaper response-level labels. The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives. In experiments, we instantiate our implicit PRMs with various objectives and evaluate their performance on MATH. We show that our implicit PRM outperforms a strong MCTS-based baseline á la Math-Shepherd using less than 1/38 of the training data. Its performance can be further improved with majority voting. We further find that scaling up instructions and responses benefits our implicit PRM, and the latter brings a larger gain. Particularly, we find that our implicit PRM, when instantiated with the cross-entropy (CE) loss, is more data-efficient and can keep improving generation models even when trained with only one response per instruction, the setup that suffers from extreme data scarcity and imbalance. Further, instructions should be relevant to downstream tasks while the diversity of responses does not bring gains. Surprisingly, training on extra Math-Shepherd step labels brings no further improvements to our implicit PRM trained on only outcome data. We hope that our work will encourage a rethinking of PRM training approaches and contribute to making training PRMs more accessible.
@article{preprint:implicitPRM, title = {Free Process Rewards without Process Labels}, bibtex_show = {true}, preview = {implicitPRM.png}, selected = {true}, author = {Yuan, Lifan and Li, Wendi and Chen, Huayu and Cui, Ganqu and Ding, Ning and Zhang, Kaiyan and Zhou, Bowen and Liu, Zhiyuan and Peng, Hao}, journal = {Preprint}, pdf = {https://arxiv.org/abs/2412.01981}, series = {\newlin arXiv}, huggingface = {https://huggingface.co/PRIME-RL}, code = {https://github.com/lifan-yuan/ImplicitPRM}, year = {Preprint} }
Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models

Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang, Juanzi Li, and Maosong Sun

Nature Machine Intelligence
Cover Article of Nature Machine Intelligence’s March Issue
World Artificial Intelligence Conference Youth Outstanding Paper Award

ABSTRACT ArXiv Link PDF CODE Bib

As pre-trained language models (PLMs) have become the fundamental infrastructure for various NLP tasks and researchers have readily enjoyed themselves in the pretrainingfinetuning paradigm, evidence from emerging research has continuously proven that larger models tend to yield better performance. However, despite the welcome outcome, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, finetuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs. In order to unleash the imagination of the possible advantages of such methods, not limited to parameter efficiency, we coined a new term delta tuning from a morphological point of view to refer to the original “parameter efficient tuning”. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divides existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning. To facilitate the research of delta tuning, we are also developing an open-source toolkit, OpenDelta , that enables practitioners to efficiently and flexibly implement delta tuning on PLMs. At last, we discuss a series of real-world applications of delta tuning.
@article{2023:delta, title = {Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models}, titleb = {Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models}, preview = {NMI.jpg}, bibtex_show = {true}, abbr = {Nat. Mach. Intell.}, code = {https://github.com/thunlp/OpenDelta}, author = {Ding, Ning and Qin, Yujia and Yang, Guang and Wei, Fuchao and Yang, Zonghan and Su, Yusheng and Hu, Shengding and Chen, Yulin and Chan, Chi-Min and Chen, Weize and Yi, Jing and Zhao, Weilin and Liu, Zhiyuan and Zheng, Hai-Tao and Chen, Jianfei and Liu, Yang and Tang, Jie and Li, Juanzi and Sun, Maosong}, journal = {Nature Machine Intelligence}, selected = {true}, arxiv = {2203.06904}, html = {https://www.nature.com/articles/s42256-023-00626-4}, pdf = {https://www.nature.com/articles/s42256-023-00626-4}, award = { Cover Article of Nature Machine Intelligence's March Issue World Artificial Intelligence Conference Youth Outstanding Paper Award }, series = {\newlin Nature Machine Intelligence}, year = {2023} }
OpenPrompt: An Open-source Framework for Prompt-learning

Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, and Maosong Sun

ACL System Demonstration 2022
Best Demo Paper Award

ABSTRACT Link PDF CODE Bib

Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to cloze-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard implementation framework of prompt-learning is proposed yet, and most existing prompt-learning codebases, often unregulated, only provide limited implementations for specific scenarios. Since there are many details such as templating strategy, initializing strategy, and verbalizing strategy, etc. need to be considered in prompt-learning, practitioners face impediments to quickly adapting the desired prompt learning methods to their applications. In this paper, we present OpenPrompt, a unified easy-to-use toolkit to conduct prompt-learning over PLMs. OpenPrompt is a research-friendly framework that is equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task formats, and prompting modules in a unified paradigm. Users could expediently deploy prompt-learning frameworks and evaluate the generalization of them on different NLP tasks without constraints.
@inproceedings{ding2021openprompt, preview = {op.png}, bibtex_show = {true}, title = { OpenPrompt: An Open-source Framework for Prompt-learning}, author = {Ding, Ning and Hu, Shengding and Zhao, Weilin and Chen, Yulin and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong}, year = {2022}, selected = {true}, booktitle = {ACL System Demonstration}, series = {\newlin ACL System Demonstration}, html = {https://arxiv.org/abs/2111.01998}, pdf = {https://arxiv.org/pdf/2111.01998.pdf}, code = {https://github.com/thunlp/OpenPrompt}, award = { <img src="/assets/img/award.png" width="15px"> Best Demo Paper Award } }
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou

EMNLP 2023
The Ultra series solutions also contain other works like UltraFeedback (ICML 2024) , UltraInteract (ICLR 2024) , UltraMedical (NeurIPS 2024) , etc.

PDF CODE Bib
@inproceedings{preprint:ultra, title = {Enhancing Chat Language Models by Scaling High-quality Instructional Conversations}, preview = {ultra_logo.png}, bibtex_show = {true}, author = {Ding, Ning and Chen, Yulin and Xu, Bokai and Qin, Yujia and Hu, Shengding and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen}, booktitle = {EMNLP}, selected = {true}, pdf = {https://arxiv.org/abs/2305.14233}, code = {https://github.com/thunlp/UltraChat}, series = {\newlin EMNLP}, award = { The Ultra series solutions also contain other works like <a href="https://arxiv.org/abs/2310.01377">UltraFeedback (ICML 2024)</a> , <a href="https://arxiv.org/abs/2404.02078">UltraInteract (ICLR 2024)</a> , <a href="https://arxiv.org/abs/2406.03949">UltraMedical (NeurIPS 2024)</a> , etc. }, year = {2023} }
Sparse Low-rank Adaptation of Pre-trained Language Models

Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou

EMNLP 2023
Prototypical Representation Learning for Relation Extraction

Ning Ding, Xiaobin Wang, Yao Fu, Guangwei Xu, Rui Wang, Pengjun Xie, Ying Shen, Fei Huang, Hai-Tao Zheng, and Rui Zhang

International Conference on Learning Representations,
ICLR 2021

ABSTRACT Link PDF CODE Bib

Recognizing relations between entities is a pivotal task of relational learning. Learning relation representations from distantly-labeled datasets is difficult because of the abundant label noise and complicated expressions in human language. This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data that are effective in different settings, including supervised, distantly supervised, and few-shot learning. Instead of solely relying on the supervision from noisy labels, we propose to learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Prototypes are representations in the feature space abstracting the essential semantics of relations between entities in sentences. We learn prototypes based on objectives with clear geometric interpretation, where the prototypes are unit vectors uniformly dispersed in a unit ball, and statement embeddings are centered at the end of their corresponding prototype vectors on the surface of the ball. This approach allows us to learn meaningful, interpretable prototypes for the final classification. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art models. We further demonstrate the robustness of the encoder and the interpretability of prototypes with extensive experiments.
@inproceedings{ding2021prototypical, preview = {proto.png}, title = {Prototypical Representation Learning for Relation Extraction}, author = {Ding, Ning and Wang, Xiaobin and Fu, Yao and Xu, Guangwei and Wang, Rui and Xie, Pengjun and Shen, Ying and Huang, Fei and Zheng, Hai-Tao and Zhang, Rui}, booktitle = {International Conference on Learning Representations, ICLR}, year = {2021}, bibtex_show = {true}, series = {\newlin ICLR}, selected = {true}, html = {https://openreview.net/forum?id=aCgLmfhIy_f}, pdf = {https://openreview.net/forum?id=aCgLmfhIy_f}, code = {https://github.com/Alibaba-NLP/ProtoRE} }

Awards

Yunfan Award of WAIC, 2024.
Young Elite Scientists Sponsorship Program by CAST, 2023.
World Artificial Intelligence Conference Youth Outstanding Paper Award, 2023.
Shuimu Tsinghua Scholar Program, 2023.
Zhang Keqian Scholar Program, 2023.
Outstanding Doctoral Dissertation of Tsinghua University, 2023.
Outstanding Graduate of DCST, Tsinghua University, 2023.
ACL Best System Demonstration Paper Award, 2022.
Baidu Ph.D Fellowship (10 recipients worldwide), 2021.
National Scholarship for Ph.D student, 2021.
National Scholarship for Ph.D student, 2020.
Tsingfeng Scholarship, Tsinghua University, 2019.
CACS Scholarship, 2019.
Excellent Graduate, 2018.
National Scholarship for undergraduate student, 2018.
First-class Academic Scholarship, 2017.
First-class Academic Scholarship, 2016.

Service

Nature Machine Intelligence
Neural Networks
NeurIPS 2020~2024
ICML 2021~2024
ACL 2020~2024
EMNLP 2020~2024
COLING 2020, 2022
AAAI 2020~2024
IJCAI 2020

Ning Ding 丁宁

Bio

Research

Ph.D. Student Recruitment (2026 Fall)

New Research on Exploration-based Method for Advanced Reasoning Released

Personal Update

Selected Papers (Full List on Google Scholar)

Awards

Service