no code implementations • 7 May 2024 • Akhil Arora, Lars Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West
Large language models (LLMs) have significantly evolved, moving from simple output generation to complex reasoning and from stand-alone usage to being embedded into broader frameworks.
1 code implementation • 4 Apr 2024 • Marija Šakota, Isaac Johnson, Guosheng Feng, Robert West
To overcome this problem and help editors write useful edit summaries, we propose a model for recommending edit summaries generated by a language model trained to produce good edit summaries given the representation of an edit diff.
no code implementations • 31 Mar 2024 • Paula Rescala, Manoel Horta Ribeiro, Tiancheng Hu, Robert West
The remarkable and ever-increasing capabilities of Large Language Models (LLMs) have raised concerns about their potential misuse for creating personalized, convincing misinformation and propaganda.
no code implementations • 21 Mar 2024 • Maxime Peyrard, Martin Josifoski, Robert West
We refer to these orchestrated interactions among semantic processors, optimizing and searching in semantic space, as semantic decoding algorithms.
no code implementations • 6 Mar 2024 • Lars Henning Klein, Roland Aydin, Robert West
Emoji have become ubiquitous in written communication, on the Web and beyond.
no code implementations • 21 Feb 2024 • Debjit Paul, Robert West, Antoine Bosselut, Boi Faltings
In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer.
1 code implementation • 16 Feb 2024 • Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West
Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space.
no code implementations • 16 Feb 2024 • Mohammad Hossein Amani, Nicolas Mario Baldwin, Amin Mansouri, Martin Josifoski, Maxime Peyrard, Robert West
Traditional language models, adept at next-token prediction in text sequences, often struggle with transduction tasks between distinct symbolic systems, particularly when parallel data is scarce.
no code implementations • 18 Jan 2024 • Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West
This paper introduces sketch-guided constrained decoding (SGCD), a novel approach to constrained decoding for blackbox LLMs, which operates without access to the logits of the blackbox LLM.
1 code implementation • 9 Jan 2024 • Tim R. Davidson, Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West
We introduce an approach to evaluate language model (LM) agency using negotiation games.
1 code implementation • 4 Dec 2023 • Giovanni Monea, Maxime Peyrard, Martin Josifoski, Vishrav Chaudhary, Jason Eisner, Emre Kiciman, Hamid Palangi, Barun Patra, Robert West
Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling.
no code implementations • 24 Oct 2023 • Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew Gordon, David Rothschild, Robert West
We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use.
no code implementations • 24 Oct 2023 • Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West
A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data.
no code implementations • 18 Oct 2023 • Giuseppe Russo, Manoel Horta Ribeiro, Robert West
Overall, our findings suggest that curtailing fringe-interactions may reduce the growth of fringe communities on mainstream platforms.
no code implementations • 31 Aug 2023 • Kristina Gligoric, Tiziano Piccardi, Jake Hofman, Robert West
Overall, we demonstrate that incorporating replication tasks into a large data science class can increase the reproducibility of scientific work as a by-product of data science instruction, thus benefiting both science and students.
no code implementations • 23 Aug 2023 • Okyaz Eminaga, Mahmoud Abbas, Christian Kunder, Yuri Tolkach, Ryan Han, James D. Brooks, Rosalie Nolley, Axel Semjonow, Martin Boegemann, Robert West, Jin Long, Richard Fan, Olaf Bettendorf
Adjusting the decision threshold for the secondary Gleason pattern from 5% to 10% improved the concordance level between pathologists and vPatho for tumor grading on prostatectomy specimens (kappa from 0. 44 to 0. 64).
2 code implementations • 11 Aug 2023 • Marija Šakota, Maxime Peyrard, Robert West
For a wide variety of tasks, inputs can be phrased as natural language prompts for an LM, from whose output the solution can then be extracted.
2 code implementations • 2 Aug 2023 • Martin Josifoski, Lars Klein, Maxime Peyrard, Nicolas Baldwin, Yifei Li, Saibo Geng, Julian Paul Schnitzler, Yuxing Yao, Jiheng Wei, Debjit Paul, Robert West
To support rapid and rigorous research, we introduce the aiFlows library embodying Flows.
1 code implementation • 13 Jun 2023 • Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West
With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results.
no code implementations • 24 May 2023 • Veniamin Veselovsky, Manoel Horta Ribeiro, Akhil Arora, Martin Josifoski, Ashton Anderson, Robert West
Large Language Models (LLMs) have democratized synthetic data generation, which in turn has the potential to simplify and broaden a wide gamut of NLP tasks.
2 code implementations • 23 May 2023 • Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West
In this work, we demonstrate that formal grammars can describe the output space for a much wider range of tasks and argue that GCD can serve as a unified framework for structured NLP tasks in general.
1 code implementation • 4 Apr 2023 • Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings
Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e. g., chain-of-thought prompting.
1 code implementation • 7 Mar 2023 • Martin Josifoski, Marija Sakota, Maxime Peyrard, Robert West
This work shows that useful data can be synthetically generated even for tasks that cannot be solved directly by LLMs: for problems with structured outputs, it is possible to prompt an LLM to perform the task in the reverse direction, by generating plausible input text for a target output structure.
no code implementations • 19 Dec 2022 • Ami Taitelbaum, Robert West, Mauro Mobilia, Michael Assaf
Here, we study population dynamics subject to a fluctuating environment modeled by a varying carrying capacity changing continuously in time according to either binary random switches, or by being driven by a noise of continuous range.
1 code implementation • 13 Oct 2022 • Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre Kiciman, Boi Faltings, Robert West
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm.
1 code implementation • 8 Oct 2022 • Niklas Stoehr, Lucas Torroba Hennigen, Josef Valvoda, Robert West, Ryan Cotterell, Aaron Schein
It is based only on the action category ("what") and disregards the subject ("who") and object ("to whom") of an event, as well as contextual information, like associated casualty count, that should contribute to the perception of an event's "intensity".
2 code implementations • 18 Sep 2022 • Valentin Hartmann, Léo Meynent, Maxime Peyrard, Dimitrios Dimitriadis, Shruti Tople, Robert West
We identify three sources of leakage: (1) memorizing specific information about the $\mathbb{E}[Y|X]$ (expected label given the feature values) of interest to the adversary, (2) wrong inductive bias of the model, and (3) finiteness of the training data.
no code implementations • 31 Aug 2022 • Pierre Colombo, Maxime Peyrard, Nathan Noiry, Robert West, Pablo Piantanida
Automatic evaluation metrics capable of replacing human judgments are critical to allowing fast development of new methods.
1 code implementation • 26 Aug 2022 • Martin Glauer, Robert West, Susan Michie, Janna Hastings
We describe a novel approach to explainable prediction of a continuous variable based on learning fuzzy weighted rules.
1 code implementation • 17 Jul 2022 • Jonathan Külz, Andreas Spitz, Ahmad Abu-Akel, Stephan Günnemann, Robert West
There is a widespread belief that the tone of US political language has become more negative recently, in particular when Donald Trump entered politics.
no code implementations • 7 Jul 2022 • Vuk Vuković, Akhil Arora, Huan-Cheng Chang, Andreas Spitz, Robert West
The use of attributed quotes is the most direct and least filtered pathway of information propagation in news.
1 code implementation • NAACL (ACL) 2022 • Marko Čuljak, Andreas Spitz, Robert West, Akhil Arora
Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods.
1 code implementation • LREC 2022 • Alberto García-Durán, Akhil Arora, Robert West
We also propose a light-weight and simple solution based on the construction of indexes whose design is motivated by more complex transfer learning based neural approaches.
1 code implementation • 20 May 2022 • Marija Sakota, Maxime Peyrard, Robert West
Wikipedia is one of the richest knowledge sources on the Web today.
1 code implementation • PVLDB 2022 • Manuel Leone, Stefano Huber, Akhil Arora, Alberto García-Durán, Robert West
Our findings shed light on the potential problems resulting from an impulsive application of neural methods as a panacea for all data analytics tasks.
1 code implementation • 17 Jan 2022 • Justyna Czestochowska, Kristina Gligoric, Maxime Peyrard, Yann Mentha, Michal Bien, Andrea Grutter, Anita Auer, Aris Xanthos, Robert West
We find that with 30 annotations per emoji, 16 emojis (1. 2%) are completely unambiguous, whereas 55 emojis (4. 3%) are so ambiguous that their descriptions are indistinguishable from randomly chosen descriptions.
1 code implementation • 10 Jan 2022 • Sylvain Lugeon, Tiziano Piccardi, Robert West
We make publicly available the curated Curlie dataset aligned across languages, the pre-trained Homepage2Vec model, and libraries
1 code implementation • NAACL 2022 • Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West
Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema.
1 code implementation • ACL 2021 • Maxime Peyrard, Wei Zhao, Steffen Eger, Robert West
Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances.
1 code implementation • 16 Oct 2021 • Maxime Peyrard, Sarvjeet Singh Ghotra, Martin Josifoski, Vidhan Agarwal, Barun Patra, Dean Carignan, Emre Kiciman, Robert West
In particular, we adapt a game-theoretic formulation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion.
1 code implementation • EMNLP 2021 • Niklas Stoehr, Lucas Torroba Hennigen, Samin Ahbab, Robert West, Ryan Cotterell
We do this by devising a set of textual and graph-based features which represent each of the causes.
1 code implementation • 19 May 2021 • Maxime Peyrard, Beatriz Borges, Kristina Gligorić, Robert West
We make progress in both respects by training and analyzing transformer-based humor recognition models on a recently introduced dataset consisting of minimal pairs of aligned sentences, one serious, the other humorous.
1 code implementation • EMNLP 2021 • Akhil Arora, Alberto García-Durán, Robert West
We propose a light-weight and scalable entity linking method, Eigenthemes, that relies solely on the availability of entity names and a referent knowledge base.
no code implementations • 17 Apr 2021 • Alberto García-Durán, Robert West
Time series with missing data are signals encountered in important settings for machine learning.
1 code implementation • 16 Apr 2021 • Roland Aydin, Lars Klein, Arnaud Miribel, Robert West
Thus, by seeing words in context, the user can assimilate new vocabulary without much conscious effort.
no code implementations • 25 Feb 2021 • Robin Mamié, Manoel Horta Ribeiro, Robert West
Our results suggest that there is a large overlap between the user bases of the Alt-right and of the Manosphere and that members of the Manosphere have a bigger chance to engage with far right content than carefully chosen counterparts.
Computers and Society
1 code implementation • 19 Feb 2021 • Thorsten Ruprechter, Manoel Horta Ribeiro, Tiago Santos, Florian Lemmerich, Markus Strohmaier, Robert West, Denis Helic
Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions.
Computers and Society
no code implementations • 17 Feb 2021 • Kristina Gligorić, Ryen W. White, Emre Kiciman, Eric Horvitz, Arnaud Chiolero, Robert West
To estimate causal effects from the passively observed log data, we control confounds in a matched quasi-experimental design: we identify focal users who at first do not have any regular eating partners but then start eating with a fixed partner regularly, and we match focal users into comparison pairs such that paired users are nearly identical with respect to covariates measured before acquiring the partner, where the two focal users' new eating partners diverge in the healthiness of their respective food choice.
1 code implementation • 18 Dec 2020 • Manoel Horta Ribeiro, Robert West
YouTube plays a key role in entertaining and informing people around the globe.
Time Series Analysis Social and Information Networks Computers and Society
1 code implementation • 10 Nov 2020 • Eda Bayram, Alberto Garcia-Duran, Robert West
The existing literature on knowledge graph completion mostly focuses on the link prediction task.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Maxime Peyrard, Robert West
The goal of text summarization is to compress documents to the relevant information while excluding background information already known to the receiver.
1 code implementation • 23 Sep 2020 • Tiziano Piccardi, Robert West
We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics.
no code implementations • 16 Sep 2020 • Kristina Gligorić, Ashton Anderson, Robert West
The prevalence of tweets around 140 characters before the switch in a given language is strongly correlated with the prevalence of tweets around 280 characters after the switch in the same language, and very long tweets are vastly more popular on Web clients than on mobile clients.
1 code implementation • 19 Aug 2020 • Kristina Gligorić, Manoel Horta Ribeiro, Martin Müller, Olesia Altunina, Maxime Peyrard, Marcel Salathé, Giovanni Colavizza, Robert West
Timely access to accurate information is crucial during the COVID-19 pandemic.
Social and Information Networks
1 code implementation • 27 Jul 2020 • Robert West
In the offline preprocessing phase, an "anchor bank" is constructed, a set of queries spanning the full spectrum of popularity, all calibrated against a common reference query by carefully chaining together multiple Google Trends requests.
1 code implementation • 5 May 2020 • Maxime Peyrard, Robert West
Causal discovery, the task of automatically constructing a causal model from data, is of major significance across the sciences.
1 code implementation • ACL 2020 • Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West, Steffen Eger
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
no code implementations • 23 Feb 2020 • Chad Peters, Babak Esfandiari, Mohamad Zalat, Robert West
Learning from Observation (LfO), also known as Behavioral Cloning, is an approach for building software agents by recording the behavior of an expert (human or artificial) and using the recorded data to generate the required behavior.
no code implementations • 9 Feb 2020 • Hristo Paskov, Alex Paskov, Robert West
We provide a methodology for learning sparse statistical models that use as features all possible multiplicative interactions among an underlying atomic set of features.
1 code implementation • 28 Jan 2020 • Blagoj Mitrevski, Tiziano Piccardi, Robert West
Wikipedia is written in the wikitext markup language.
Computers and Society
1 code implementation • 23 Jan 2020 • Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West
Wikipedia, the free online encyclopedia that anyone can edit, is one of the most visited sites on the Web and a common source of information for many users.
Computers and Society
2 code implementations • 28 Dec 2019 • Ali Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier, Robert West, Martin Jaggi
Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.
Cross-Lingual Document Classification Cross-Lingual Word Embeddings +8
no code implementations • 11 Oct 2019 • Okyaz Eminaga, Yuri Tolkach, Christian Kunder, Mahmood Abbas, Ryan Han, Rosalie Nolley, Axel Semjonow, Martin Boegemann, Sebastian Huss, Andreas Loening, Robert West, Geoffrey Sonn, Richard Fan, Olaf Bettendorf, James Brook, Daniel Rubin
For case usage, these models were applied for the annotation tasks in clinician-oriented pathology reports for prostatectomy specimens.
no code implementations • 25 Sep 2019 • Alberto Garcia-Duran, Robert West
The most successful prior approaches for modeling such time series are based on recurrent neural networks that learn to impute unobserved values and then treat the imputed values as observed.
1 code implementation • 22 Aug 2019 • Manoel Horta Ribeiro, Raphael Ottoni, Robert West, Virgílio A. F. Almeida, Wagner Meira
Non-profits, as well as the media, have hypothesized the existence of a radicalization pipeline on YouTube, claiming that users systematically progress towards more extreme content on the platform.
Computers and Society Social and Information Networks
1 code implementation • 8 Jul 2019 • Valentin Hartmann, Konark Modi, Josep M. Pujol, Robert West
Second, we implement SecVM's distributed framework for the Cliqz web browser and deploy it for predicting user gender in a large-scale online evaluation with thousands of clients, outperforming baselines by a large margin and thus showcasing that SecVM is suitable for production environments.
1 code implementation • 27 Jun 2019 • Valentin Hartmann, Robert West
For population studies or for the training of complex machine learning models, it is often required to gather data from different actors.
1 code implementation • 8 Apr 2019 • Ramtin Yazdanian, Leila Zia, Jonathan Morgan, Bahodir Mansurov, Robert West
As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem.
1 code implementation • 8 Apr 2019 • Martin Josifoski, Ivan S. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West
Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.
1 code implementation • 5 Apr 2019 • Kiran Garimella, Robert West
We show that user impact tends to have certain characteristics: First, impact is clustered in time, such that the most impactful tweets of a user appear close to each other.
Social and Information Networks
no code implementations • 23 Mar 2019 • Meryem M'hamdi, Robert West, Andreea Hossmann, Michael Baeriswyl, Claudiu Musat
In particular, we test the hypothesis that embeddings with context are more effective, by multi-tasking the learning of multilingual word embeddings and text classification; we explore neural architectures for CLTC; and we move from bi- to multi-lingual word embeddings.
1 code implementation • 10 Jan 2019 • Robert West, Eric Horvitz
Starting from the observation that satirical news headlines tend to resemble serious news headlines, we build and analyze a corpus of satirical headlines paired with nearly identical but serious headlines.
no code implementations • 13 Dec 2018 • Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury
The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the vectors of the representative words of the concepts of interest (such as genders).
1 code implementation • CONLL 2018 • Christian Abbet, Meryem M'hamdi, Athanasios Giannakopoulos, Robert West, Andreea Hossmann, Michael Baeriswyl, Claudiu Musat
To this end, we crowdsource and publish a dataset of churn intent expressions in chatbot interactions in German and English.
1 code implementation • 7 Apr 2018 • Dario Pavllo, Tiziano Piccardi, Robert West
We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora.
2 code implementations • 12 Apr 2016 • Ellery Wulczyn, Robert West, Leila Zia, Jure Leskovec
The system involves identifying missing articles, ranking the missing articles according to their importance, and recommending important missing articles to editors based on their interests.
Social and Information Networks Digital Libraries
no code implementations • TACL 2014 • Robert West, Hristo S. Paskov, Jure Leskovec, Christopher Potts
Person-to-person evaluations are prevalent in all kinds of discourse and important for establishing reputations, building social bonds, and shaping public opinion.
no code implementations • NeurIPS 2013 • Hristo S. Paskov, Robert West, John C. Mitchell, Trevor Hastie
This paper addresses the problem of unsupervised feature learning for text data.