Search Results for author: Arnold Overwijk

Found 9 papers, 7 papers with code

Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder

1 code implementation • EMNLP 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, Arnold Overwijk

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.

Decoder Language Modelling +5

Paper
Code

Improving Multitask Retrieval by Promoting Task Specialization

1 code implementation • 1 Jul 2023 • Wenzheng Zhang, Chenyan Xiong, Karl Stratos, Arnold Overwijk

In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks.

Retrieval

Paper
Code

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories

no code implementations • 7 Feb 2023 • Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett

In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time.

Retrieval Zero-shot Generalization

Paper
Add Code

ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information

no code implementations • 29 Nov 2022 • Arnold Overwijk, Chenyan Xiong, Xiao Liu, Cameron VandenBerg, Jamie Callan

ClueWeb22, the newest iteration of the ClueWeb line of datasets, provides 10 billion web pages affiliated with rich information.

document understanding Retrieval

Paper
Add Code

Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Negatives

1 code implementation • 31 Oct 2022 • Si Sun, Chenyan Xiong, Yue Yu, Arnold Overwijk, Zhiyuan Liu, Jie Bao

In this paper, we investigate the instability in the standard dense retrieval training, which iterates between model training and hard negative selection using the being-trained model.

Retrieval

Paper
Code

COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

1 code implementation • 27 Oct 2022 • Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, Arnold Overwijk

We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios.

Ranked #1 on Zero-shot Text Search on CQADupStack

Language Modelling Retrieval +2

Paper
Code

Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

1 code implementation • 18 Feb 2021 • Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, TieYan Liu, Arnold Overwijk

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space.

Decoder Language Modelling +4

Paper
Code

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

5 code implementations • ICLR 2021 • Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk

In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing.

Ranked #7 on Passage Retrieval on Natural Questions

Contrastive Learning Passage Retrieval +3

348

Paper
Code

Open Domain Web Keyphrase Extraction Beyond Language Modeling

2 code implementations • IJCNLP 2019 • Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk

This paper studies keyphrase extraction in real-world scenarios where documents are from diverse domains and have variant content quality.

Keyphrase Extraction Language Modelling

151

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.