no code implementations • ACL (NLP4Prog) 2021 • Xinyu Zhang, Ji Xin, Andrew Yates, Jimmy Lin
The task of semantic code search is to retrieve code snippets from a source code corpus based on an information need expressed in natural language.
no code implementations • EMNLP 2020 • Anna Tigunova, Andrew Yates, Paramita Mirza, Gerhard Weikum
Personal knowledge about users{'} professions, hobbies, favorite food, and travel preferences, among others, is a valuable asset for individualized AI, such as recommenders or chatbots.
no code implementations • EMNLP (sustainlp) 2020 • Xinyu Zhang, Andrew Yates, Jimmy Lin
Researchers have proposed simple yet effective techniques for the retrieval problem based on using BERT as a relevance classifier to rerank initial candidates from keyword search.
no code implementations • EMNLP 2021 • Anna Tigunova, Paramita Mirza, Andrew Yates, Gerhard Weikum
Automatically extracting interpersonal relationships of conversation interlocutors can enrich personal knowledge bases to enhance personalized search, recommenders and chatbots.
no code implementations • 2 May 2024 • Ming Li, Yuanna Liu, Sami Jullien, Mozhdeh Ariannezhad, Mohammad Aliannejadi, Andrew Yates, Maarten de Rijke
So far, most NBR studies have focused on optimizing the accuracy of the recommendation, whereas optimizing for beyond-accuracy metrics, e. g., item fairness and diversity remains largely unexplored.
1 code implementation • 28 Feb 2024 • Yibin Lei, Yu Cao, Tianyi Zhou, Tao Shen, Andrew Yates
Recent studies demonstrate that query expansions generated by large language models (LLMs) can considerably enhance information retrieval systems by generating hypothetical documents that answer the queries as expansions.
no code implementations • 28 Feb 2024 • Yibin Lei, Di wu, Tianyi Zhou, Tao Shen, Yu Cao, Chongyang Tao, Andrew Yates
In this work, we introduce a new unsupervised embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning or task-specific engineering.
1 code implementation • 27 Feb 2024 • Thong Nguyen, Mariya Hendriksen, Andrew Yates, Maarten de Rijke
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
1 code implementation • 27 Feb 2024 • Maurits Bleeker, Mariya Hendriksen, Andrew Yates, Maarten de Rijke
Hence, contrastive losses are not sufficient to learn task-optimal representations, i. e., representations that contain all task-relevant information shared between the image and associated captions.
no code implementations • 12 Feb 2024 • Thong Nguyen, Mariya Hendriksen, Andrew Yates
Motivated by this, in this work, we explore the application of LSR in the multi-modal domain, i. e., we focus on Multi-Modal Learned Sparse Retrieval (MLSR).
no code implementations • 2 Nov 2023 • Ghazaleh Haratinezhad Torbati, Anna Tigunova, Andrew Yates, Gerhard Weikum
Recommender systems are most successful for popular items and users with ample interactions (likes, ratings etc.).
no code implementations • 2 Oct 2023 • Andrew Yates, Michael Unterkalmsteiner
We replicate prior work on ranking domain-specific synonyms in the consumer health domain by applying the approach to a new language and domain: identifying Swedish language synonyms in the building construction domain.
1 code implementation • 2 Aug 2023 • Ming Li, Mozhdeh Ariannezhad, Andrew Yates, Maarten de Rijke
In next basket recommendation (NBR), it is useful to distinguish between repeat items, i. e., items that a user has consumed before, and explore items, i. e., items that a user has not consumed before.
no code implementations • 20 Jun 2023 • Thong Nguyen, Andrew Yates
Generative retrieval is a promising new neural retrieval paradigm that aims to optimize the retrieval pipeline by performing both indexing and retrieval with a single transformer model.
1 code implementation • 5 Jun 2023 • Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, DaCheng Tao
Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios.
1 code implementation • 29 May 2023 • Thong Nguyen, Sean MacAvaney, Andrew Yates
We investigate existing aggregation approaches for adapting LSR to longer documents and find that proximal scoring is crucial for LSR to handle long documents.
1 code implementation • 22 May 2023 • Vaishali Pal, Andrew Yates, Evangelos Kanoulas, Maarten de Rijke
Recent advances in tabular question answering (QA) with large language models are constrained in their coverage and only answer questions over a single table.
1 code implementation • 23 Mar 2023 • Thong Nguyen, Sean MacAvaney, Andrew Yates
We then reproduce all prominent methods using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect effectiveness and efficiency.
1 code implementation • 28 Apr 2022 • Maurits Bleeker, Andrew Yates, Maarten de Rijke
We add an additional decoder to the contrastive ICR framework, to reconstruct the input caption in a latent space of a general-purpose sentence encoder, which prevents the image and caption encoder from suppressing predictive features.
1 code implementation • 22 Apr 2022 • Antonios Minas Krasakis, Andrew Yates, Evangelos Kanoulas
Current conversational passage retrieval systems cast conversational search into ad-hoc search by using an intermediate query resolution step that places the user's question in context of the conversation.
1 code implementation • ACL 2022 • Thong Nguyen, Andrew Yates, Ayah Zirikly, Bart Desmet, Arman Cohan
In dataset-transfer experiments on three social media datasets, we find that grounding the model in PHQ9's symptoms substantially improves its ability to generalize to out-of-distribution data compared to a standard BERT-based approach.
no code implementations • 10 Oct 2021 • Simon Razniewski, Andrew Yates, Nora Kassner, Gerhard Weikum
Pre-trained language models (LMs) have recently gained attention for their potential as an alternative to (or proxy for) explicit knowledge bases (KBs).
no code implementations • 10 Sep 2021 • Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum
Prior work on personalizing web search results has focused on considering query-and-click logs to capture users individual interests.
no code implementations • 10 Sep 2021 • Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum
The paper develops an expressive model and effective methods for personalizing search-based entity recommendations.
1 code implementation • 17 May 2021 • Iain Mackie, Jeffery Dalton, Andrew Yates
Deep Learning Hard (DL-HARD) is a new annotated dataset designed to more effectively evaluate neural ranking models on complex topics.
no code implementations • 9 Mar 2021 • Shahrzad Naseri, Jeffrey Dalton, Andrew Yates, James Allan
We find that CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models.
1 code implementation • 3 Mar 2021 • Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian
Managing the data for Information Retrieval (IR) experiments can be challenging.
1 code implementation • NAACL 2021 • Jimmy Lin, Rodrigo Nogueira, Andrew Yates
There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i. e., result quality) and efficiency (e. g., query latency, model and index size).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhi Zheng, Kai Hui, Ben He, Xianpei Han, Le Sun, Andrew Yates
Query expansion aims to mitigate the mismatch between the language used in a query and in a document.
1 code implementation • 20 Aug 2020 • Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun
In this work, we explore strategies for aggregating relevance signals from a document's passages into a final ranking score.
Ranked #2 on Ad-Hoc Information Retrieval on TREC Robust04
no code implementations • LREC 2020 • Anna Tigunova, Paramita Mirza, Andrew Yates, Gerhard Weikum
To the best of our knowledge, RedDust is the first annotated language resource about Reddit users at large scale.
1 code implementation • IJCNLP 2019 • Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum
Controversial claims are abundant in online media and discussion forums.
1 code implementation • 24 Apr 2019 • Anna Tigunova, Andrew Yates, Paramita Mirza, Gerhard Weikum
Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation.
7 code implementations • 15 Apr 2019 • Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian
We call this joint approach CEDR (Contextualized Embeddings for Document Ranking).
Ranked #3 on Ad-Hoc Information Retrieval on TREC Robust04
no code implementations • 11 Apr 2019 • Siddhant Arora, Andrew Yates
We consider algorithm selection in the context of ad-hoc information retrieval.
1 code implementation • WS 2019 • Michael A. Hedderich, Andrew Yates, Dietrich Klakow, Gerard de Melo
However, they typically cannot serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word.
1 code implementation • EMNLP 2018 • Canjia Li, Yingfei Sun, Ben He, Le Wang, Kai Hui, Andrew Yates, Le Sun, Jungang Xu
Pseudo-relevance feedback (PRF) is commonly used to boost the performance of traditional information retrieval (IR) models by using top-ranked documents to identify and weight new query terms, thereby reducing the effect of query-document vocabulary mismatches.
Ranked #9 on Ad-Hoc Information Retrieval on TREC Robust04
2 code implementations • EMNLP 2018 • Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum
Misinformation such as fake news is one of the big challenges of our society.
no code implementations • WS 2018 • Sean MacAvaney, Bart Desmet, Arman Cohan, Luca Soldaini, Andrew Yates, Ayah Zirikly, Nazli Goharian
Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media.
no code implementations • COLING 2018 • Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney, Nazli Goharian
Mental health is a significant and growing public health concern.
no code implementations • EMNLP 2017 • Andrew Yates, Arman Cohan, Nazli Goharian
We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts.
1 code implementation • 1 Jul 2017 • Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder
One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training.
3 code implementations • 30 Jun 2017 • Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo
Neural IR models, such as DRMM and PACRR, have achieved strong results by successfully capturing relevance matching signals.
no code implementations • 27 Jun 2017 • Andrew Yates, Kai Hui
Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval.
3 code implementations • EMNLP 2017 • Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo
In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query.
no code implementations • 22 Feb 2017 • Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian
Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.
no code implementations • LREC 2016 • Andrew Yates, Alek Kolcz, Nazli Goharian, Ophir Frieder
In this work we use a larger feed to investigate the effects of sampling on Twitter trend detection.
no code implementations • LREC 2014 • Andrew Yates, Jon Parker, Nazli Goharian, Ophir Frieder
With the rapid growth of social media, there is increasing potential to augment traditional public health surveillance methods with data from social media.