Search Results for author: Jingwen Lu

Found 5 papers, 4 papers with code

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

1 code implementation • 13 May 2024 • Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik, Harsha Vardhan Simhadri, Manik Varma, Yujing Wang, Linjun Yang, Mao Yang, Ce Zhang

Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals.

Information Retrieval Retrieval

236

Paper
Code

LEAD: Liberal Feature-based Distillation for Dense Retrieval

1 code implementation • 10 Dec 2022 • Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model.

Document Ranking Knowledge Distillation +2

103

Paper
Code

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

1 code implementation • 21 Oct 2022 • Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, Weizhu Chen

Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.

Retrieval Text Retrieval

103

Paper
Code

PROD: Progressive Distillation for Dense Retrieval

1 code implementation • 27 Sep 2022 • Zhenghao Lin, Yeyun Gong, Xiao Liu, Hang Zhang, Chen Lin, Anlei Dong, Jian Jiao, Jingwen Lu, Daxin Jiang, Rangan Majumder, Nan Duan

It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student.

Knowledge Distillation Natural Questions +1

103

Paper
Code

Aligning the Pretraining and Finetuning Objectives of Language Models

no code implementations • 5 Feb 2020 • Nuo Wang Pierse, Jingwen Lu

We found that, with objective alignment, our 768 by 3 and 512 by 3 transformer language models can reach accuracy of 83. 9%/82. 5% for concept-of-interest tagging and 73. 8%/70. 2% for acronym detection using only 200 finetuning examples per task, outperforming the 768 by 3 model pretrained without objective alignment by +4. 8%/+3. 4% and +9. 9%/+6. 3%.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.