1 code implementation • EMNLP 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk
Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.
1 code implementation • 1 Jul 2023 • Wenzheng Zhang, Chenyan Xiong, Karl Stratos, Arnold Overwijk
In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks.
no code implementations • 7 Feb 2023 • Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett
In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time.
no code implementations • 29 Nov 2022 • Arnold Overwijk, Chenyan Xiong, Xiao Liu, Cameron VandenBerg, Jamie Callan
ClueWeb22, the newest iteration of the ClueWeb line of datasets, provides 10 billion web pages affiliated with rich information.
1 code implementation • 31 Oct 2022 • Si Sun, Chenyan Xiong, Yue Yu, Arnold Overwijk, Zhiyuan Liu, Jie Bao
In this paper, we investigate the instability in the standard dense retrieval training, which iterates between model training and hard negative selection using the being-trained model.
1 code implementation • 27 Oct 2022 • Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, Arnold Overwijk
We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios.
Ranked #1 on Zero-shot Text Search on CQADupStack
1 code implementation • 18 Feb 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, TieYan Liu, Arnold Overwijk
Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.
5 code implementations • ICLR 2021 • Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk
In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing.
Ranked #7 on Passage Retrieval on Natural Questions
2 code implementations • IJCNLP 2019 • Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk
This paper studies keyphrase extraction in real-world scenarios where documents are from diverse domains and have variant content quality.