I am an incoming assistant professor and a postdoc researcher at the Department of Electronic Engineering, Tsinghua University, collaborating with Prof. Bowen Zhou.
I received my Ph.D. at the Department of Computer Science and Technology, Tsinghua Univeristy in 2023, advised by Prof. Hai-Tao Zheng and also co-advised by Prof. Zhiyuan Liu.
Research
My research spans the areas of natural language processing and machine learning.
Currently I am working on theories and scalable methods for developing reasoning intelligence that balances exploration and learning. I am also interested in how specialized general reasoners can facilitate scientific innovations.
Ph.D. Student Recruitment (2026 Fall)
Our group is looking for self-motivated Ph.D. students (also including Postdocs and interns) to join us for 2026 Fall. Research topics include but are not limited to scalable reinforcement learning, fundamental theories, scientific applications of reasoning language models.
If you are interested, please drop me an email to ningding.cs@gmail.com or dn97@mail.tsinghua.edu.cn with your CV and research interests.
New Research on Exploration-based Method for Advanced Reasoning Released
Dense process rewards have proven a more effective alternative to the sparse outcome-level rewards in the inference-time scaling of large language models (LLMs), particularly in tasks requiring complex multi-step reasoning. While dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards, such as training efficiency and credit assignment, this potential remains largely unrealized. This can be primarily attributed to the challenges of training process reward models (PRMs) online, where collecting high-quality process labels is prohibitively expensive, making them particularly vulnerable to reward hacking. To address these challenges, we propose PRIME (Process Reinforcement through IMplicit rEwards), which enables online PRM updates using only policy rollouts and outcome labels through implict process rewards. PRIME combines well with various advantage functions and forgoes the dedicated reward model training phrase that existing approaches require, substantially reducing the development overhead. We demonstrate PRIMEās effectiveness on competitional math and coding. Starting from Qwen2.5-Math-7B-Base, PRIME achieves a 15.1% average improvement across several key reasoning benchmarks over the SFT model. Notably, our resulting model, Eurus-2-7B-PRIME, surpasses Qwen2.5-Math-7B-Instruct on seven reasoning benchmarks with 10% of its training data.
@article{preprint:prime,title={Process Reinforcement through Implicit Rewards},bibtex_show={true},preview={prime.gif},selected={true},author={Cui, Ganqu and Yuan, Lifan and Wang, Zefan and Wang, Hanbin and Li, Wendi and He, Bingxiang and Fan, Yuchen and Yu, Tianyu and Xu, Qixin and Chen, Weize and Yuan, Jiarui and Chen, Huayu and Zhang, Kaiyan and Lv, Xingtai and Wang, Shuo and Yao, Yuan and Han, Xu and Peng, Hao and Cheng, Yu and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen and Ding, Ning},journal={Preprint},pdf={https://arxiv.org/abs/2502.01456},series={\newlin arXiv},code={https://github.com/PRIME-RL/PRIME},huggingface={https://huggingface.co/PRIME-RL},year={Preprint}}
Free Process Rewards without Process Labels
Lifan Yuan,
Wendi Li,
Huayu Chen,
Ganqu Cui,
Ning Ding,
Kaiyan Zhang,
Bowen Zhou,
Zhiyuan Liu,
and Hao Peng
Different from its counterpart outcome reward models (ORMs), which evaluate the entire responses, a process reward model (PRM) scores a reasoning trajectory step by step, providing denser and more fine grained rewards. However, training a PRM requires labels annotated at every intermediate step, presenting significant challenges for both manual and automatic data collection. This paper aims to address this challenge. Both theoretically and empirically, we show that an implicit PRM can be obtained at no additional cost, by simply training an ORM on the cheaper response-level labels. The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives. In experiments, we instantiate our implicit PRMs with various objectives and evaluate their performance on MATH. We show that our implicit PRM outperforms a strong MCTS-based baseline Ć” la Math-Shepherd using less than 1/38 of the training data. Its performance can be further improved with majority voting. We further find that scaling up instructions and responses benefits our implicit PRM, and the latter brings a larger gain. Particularly, we find that our implicit PRM, when instantiated with the cross-entropy (CE) loss, is more data-efficient and can keep improving generation models even when trained with only one response per instruction, the setup that suffers from extreme data scarcity and imbalance. Further, instructions should be relevant to downstream tasks while the diversity of responses does not bring gains. Surprisingly, training on extra Math-Shepherd step labels brings no further improvements to our implicit PRM trained on only outcome data. We hope that our work will encourage a rethinking of PRM training approaches and contribute to making training PRMs more accessible.
@article{preprint:implicitPRM,title={Free Process Rewards without Process Labels},bibtex_show={true},preview={implicitPRM.png},selected={true},author={Yuan, Lifan and Li, Wendi and Chen, Huayu and Cui, Ganqu and Ding, Ning and Zhang, Kaiyan and Zhou, Bowen and Liu, Zhiyuan and Peng, Hao},journal={Preprint},pdf={https://arxiv.org/abs/2412.01981},series={\newlin arXiv},huggingface={https://huggingface.co/PRIME-RL},code={https://github.com/lifan-yuan/ImplicitPRM},year={Preprint}}
Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models
Ning Ding,
Yujia Qin,
Guang Yang,
Fuchao Wei,
Zonghan Yang,
Yusheng Su,
Shengding Hu,
Yulin Chen,
Chi-Min Chan,
Weize Chen,
Jing Yi,
Weilin Zhao,
Zhiyuan Liu,
Hai-Tao Zheng,
Jianfei Chen,
Yang Liu,
Jie Tang,
Juanzi Li,
and Maosong Sun
Nature Machine Intelligence Cover Article of Nature Machine Intelligenceās March Issue World Artificial Intelligence Conference Youth Outstanding Paper Award
As pre-trained language models (PLMs) have become the fundamental infrastructure for various NLP tasks and researchers have readily enjoyed themselves in the pretrainingfinetuning paradigm, evidence from emerging research has continuously proven that larger models tend to yield better performance. However, despite the welcome outcome, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, finetuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs. In order to unleash the imagination of the possible advantages of such methods, not limited to parameter efficiency, we coined a new term delta tuning from a morphological point of view to refer to the original āparameter efficient tuningā. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divides existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning. To facilitate the research of delta tuning, we are also developing an open-source toolkit, OpenDelta , that enables practitioners to efficiently and flexibly implement delta tuning on PLMs. At last, we discuss a series of real-world applications of delta tuning.
@article{2023:delta,title={Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models},titleb={Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models},preview={NMI.jpg},bibtex_show={true},abbr={Nat. <br>Mach. <br>Intell.},code={https://github.com/thunlp/OpenDelta},author={Ding, Ning and Qin, Yujia and Yang, Guang and Wei, Fuchao and Yang, Zonghan and Su, Yusheng and Hu, Shengding and Chen, Yulin and Chan, Chi-Min and Chen, Weize and Yi, Jing and Zhao, Weilin and Liu, Zhiyuan and Zheng, Hai-Tao and Chen, Jianfei and Liu, Yang and Tang, Jie and Li, Juanzi and Sun, Maosong},journal={Nature Machine Intelligence},selected={true},arxiv={2203.06904},html={https://www.nature.com/articles/s42256-023-00626-4},pdf={https://www.nature.com/articles/s42256-023-00626-4},award={ <br> <font color="BB0A21"> Cover Article of Nature Machine Intelligence's March Issue </font></strong> <br> <font color="BB0A21"> World Artificial Intelligence Conference Youth Outstanding Paper Award </font> },series={\newlin Nature Machine Intelligence},year={2023}}
OpenPrompt: An Open-source Framework for Prompt-learning
Ning Ding,
Shengding Hu,
Weilin Zhao,
Yulin Chen,
Zhiyuan Liu,
Hai-Tao Zheng,
and Maosong Sun
ACL System Demonstration
2022
Best Demo Paper Award
Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to cloze-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard implementation framework of prompt-learning is proposed yet, and most existing prompt-learning codebases, often unregulated, only provide limited implementations for specific scenarios. Since there are many details such as templating strategy, initializing strategy, and verbalizing strategy, etc. need to be considered in prompt-learning, practitioners face impediments to quickly adapting the desired prompt learning methods to their applications. In this paper, we present OpenPrompt, a unified easy-to-use toolkit to conduct prompt-learning over PLMs. OpenPrompt is a research-friendly framework that is equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task formats, and prompting modules in a unified paradigm. Users could expediently deploy prompt-learning frameworks and evaluate the generalization of them on different NLP tasks without constraints.
@inproceedings{ding2021openprompt,preview={op.png},bibtex_show={true},title={ OpenPrompt: An Open-source Framework for Prompt-learning},author={Ding, Ning and Hu, Shengding and Zhao, Weilin and Chen, Yulin and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong},year={2022},selected={true},booktitle={ACL System Demonstration},series={\newlin ACL System Demonstration},html={https://arxiv.org/abs/2111.01998},pdf={https://arxiv.org/pdf/2111.01998.pdf},code={https://github.com/thunlp/OpenPrompt},award={ <br> <img src="/assets/img/award.png" width="15px"> <font color="BB0A21"> Best Demo Paper Award </font>}}
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding,
Yulin Chen,
Bokai Xu,
Yujia Qin,
Shengding Hu,
Zhiyuan Liu,
Maosong Sun,
and Bowen Zhou
@inproceedings{preprint:ultra,title={Enhancing Chat Language Models by Scaling High-quality Instructional Conversations},preview={ultra_logo.png},bibtex_show={true},author={Ding, Ning and Chen, Yulin and Xu, Bokai and Qin, Yujia and Hu, Shengding and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen},booktitle={EMNLP},selected={true},pdf={https://arxiv.org/abs/2305.14233},code={https://github.com/thunlp/UltraChat},series={\newlin EMNLP},award={ <br> The Ultra series solutions also contain other works like <a href="https://arxiv.org/abs/2310.01377">UltraFeedback (ICML 2024)</a> </font></strong> , <a href="https://arxiv.org/abs/2404.02078">UltraInteract (ICLR 2024)</a> </font></strong> , <a href="https://arxiv.org/abs/2406.03949">UltraMedical (NeurIPS 2024)</a> </font></strong>, etc. </strong>},year={2023}}
Sparse Low-rank Adaptation of Pre-trained Language Models
Ning Ding,
Yulin Chen,
Bokai Xu,
Yujia Qin,
Shengding Hu,
Zhiyuan Liu,
Maosong Sun,
and Bowen Zhou
EMNLP
2023
Prototypical Representation Learning for Relation Extraction
Ning Ding,
Xiaobin Wang,
Yao Fu,
Guangwei Xu,
Rui Wang,
Pengjun Xie,
Ying Shen,
Fei Huang,
Hai-Tao Zheng,
and Rui Zhang
International Conference on Learning Representations, ICLR
2021
Recognizing relations between entities is a pivotal task of relational learning. Learning relation representations from distantly-labeled datasets is difficult because of the abundant label noise and complicated expressions in human language. This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data that are effective in different settings, including supervised, distantly supervised, and few-shot learning. Instead of solely relying on the supervision from noisy labels, we propose to learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Prototypes are representations in the feature space abstracting the essential semantics of relations between entities in sentences. We learn prototypes based on objectives with clear geometric interpretation, where the prototypes are unit vectors uniformly dispersed in a unit ball, and statement embeddings are centered at the end of their corresponding prototype vectors on the surface of the ball. This approach allows us to learn meaningful, interpretable prototypes for the final classification. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art models. We further demonstrate the robustness of the encoder and the interpretability of prototypes with extensive experiments.
@inproceedings{ding2021prototypical,preview={proto.png},title={Prototypical Representation Learning for Relation Extraction},author={Ding, Ning and Wang, Xiaobin and Fu, Yao and Xu, Guangwei and Wang, Rui and Xie, Pengjun and Shen, Ying and Huang, Fei and Zheng, Hai-Tao and Zhang, Rui},booktitle={International Conference on Learning Representations, <br> ICLR},year={2021},bibtex_show={true},series={\newlin ICLR},selected={true},html={https://openreview.net/forum?id=aCgLmfhIy_f},pdf={https://openreview.net/forum?id=aCgLmfhIy_f},code={https://github.com/Alibaba-NLP/ProtoRE}}
Awards
Yunfan Award of WAIC, 2024.
Young Elite Scientists Sponsorship Program by CAST, 2023.
World Artificial Intelligence Conference Youth Outstanding Paper Award, 2023.
Shuimu Tsinghua Scholar Program, 2023.
Zhang Keqian Scholar Program, 2023.
Outstanding Doctoral Dissertation of Tsinghua University, 2023.
Outstanding Graduate of DCST, Tsinghua University, 2023.