1 code implementation • EMNLP 2021 • Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell
We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.
no code implementations • ACL 2022 • Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell
When generating natural language from neural probabilistic models, high probability does not always coincide with high quality.
no code implementations • EMNLP 2021 • Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell
Beam search is the default decoding strategy for many sequence generation tasks in NLP.
no code implementations • 25 Mar 2024 • Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell
For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task.
no code implementations • 6 Dec 2023 • Tiago Pimentel, Clara Meister, Ethan Gotlieb Wilcox, Kyle Mahowald, Ryan Cotterell
Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio.
no code implementations • 7 Nov 2023 • Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu, Li Du
Large language models have become one of the most commonly deployed NLP inventions.
no code implementations • 7 Jul 2023 • Ethan Gotlieb Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, Roger P. Levy
We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families.
1 code implementation • 7 Jul 2023 • Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan Cotterell
While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
1 code implementation • 29 Jun 2023 • Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell
Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$.
1 code implementation • 29 Jun 2023 • Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Mrinmaya Sachan, Ryan Cotterell
Subword tokenization is a key part of many NLP pipelines.
1 code implementation • 6 Jun 2023 • Thomas Hikaru Clark, Clara Meister, Tiago Pimentel, Michael Hahn, Ryan Cotterell, Richard Futrell, Roger Levy
Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically.
no code implementations • 20 Dec 2022 • Li Du, Lucas Torroba Hennigen, Tiago Pimentel, Clara Meister, Jason Eisner, Ryan Cotterell
Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings.
no code implementations • 19 Dec 2022 • Clara Meister, Wojciech Stokowiec, Tiago Pimentel, Lei Yu, Laura Rimell, Adhiguna Kuncoro
After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, making it difficult to estimate the probability distribution over next tokens.
1 code implementation • 25 Nov 2022 • Tiago Pimentel, Clara Meister, Ethan G. Wilcox, Roger Levy, Ryan Cotterell
We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking.
2 code implementations • 24 Oct 2022 • Liam van der Poel, Ryan Cotterell, Clara Meister
Despite significant progress in the quality of language generated from abstractive summarization models, these models still exhibit the tendency to hallucinate, i. e., output content not supported by the source document.
1 code implementation • 31 May 2022 • Tiago Pimentel, Clara Meister, Ryan Cotterell
As we show, however, this is not a tight approximation -- in either theory or practice.
1 code implementation • 14 May 2022 • Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell
Probing has become a go-to methodology for interpreting and analyzing deep neural models in natural language processing.
no code implementations • ACL 2022 • Aryaman Arora, Clara Meister, Ryan Cotterell
Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language.
no code implementations • 31 Mar 2022 • Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell
Specifically, we posit that human-like language should contain an amount of information (quantified as negative log-probability) that is close to the entropy of the distribution over natural strings.
no code implementations • ACL 2022 • Clara Meister, Tiago Pimentel, Thomas Hikaru Clark, Ryan Cotterell, Roger Levy
Numerous analyses of reading time (RT) data have been implemented -- all in an effort to better understand the cognitive processes driving reading comprehension.
no code implementations • 29 Mar 2022 • Gian Wiher, Clara Meister, Ryan Cotterell
For example, the nature of the diversity-quality trade-off in language generation is very task-specific; the length bias often attributed to beam search is not constant across tasks.
3 code implementations • 1 Feb 2022 • Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell
Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, locally typical sampling offers competitive performance (in both abstractive summarization and story generation) in terms of quality while consistently reducing degenerate repetitions.
1 code implementation • 30 Sep 2021 • Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell
We thus conclude that there is strong evidence of a surprisal--duration trade-off in operation, both across and within the world's languages.
1 code implementation • EMNLP 2021 • Tiago Pimentel, Clara Meister, Simone Teufel, Ryan Cotterell
Homophony's widespread presence in natural languages is a controversial topic.
no code implementations • EMNLP 2021 • Clara Meister, Tiago Pimentel, Patrick Haller, Lena Jäger, Ryan Cotterell, Roger Levy
The uniform information density (UID) hypothesis posits a preference among language users for utterances structured such that information is distributed uniformly across a signal.
1 code implementation • 22 Sep 2021 • Clara Meister, Afra Amini, Tim Vieira, Ryan Cotterell
In this work, we propose a new method for turning beam search into a stochastic process: Conditional Poisson stochastic beam search.
1 code implementation • Findings (EMNLP) 2021 • Damian Pascual, Beni Egressy, Clara Meister, Ryan Cotterell, Roger Wattenhofer
Large pre-trained language models have repeatedly shown their ability to produce fluent text.
no code implementations • ACL 2021 • Clara Meister, Martina Forster, Ryan Cotterell
Beam search is a go-to strategy for decoding neural sequence models.
no code implementations • ACL 2021 • Clara Meister, Stefan Lazov, Isabelle Augenstein, Ryan Cotterell
Sparse attention has been claimed to increase model interpretability under the assumption that it highlights influential inputs.
no code implementations • ACL 2021 • Clara Meister, Ryan Cotterell
As concrete examples, text generated under the nucleus sampling scheme adheres more closely to the type--token relationship of natural language than text produced using standard ancestral sampling; text from LSTMs reflects the natural language distributions over length, stopwords, and symbols surprisingly well.
no code implementations • ACL 2021 • Jason Wei, Clara Meister, Ryan Cotterell
The uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend to distribute information uniformly across a linguistic signal, has gained traction in psycholinguistics as an explanation for certain syntactic, morphological, and prosodic choices.
no code implementations • EACL 2021 • Martina Forster, Clara Meister, Ryan Cotterell
Yet, on word-level tasks, exact inference of these models reveals the empty string is often the global optimum.
1 code implementation • EMNLP 2020 • Clara Meister, Tim Vieira, Ryan Cotterell
This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question?
1 code implementation • 8 Jul 2020 • Clara Meister, Tim Vieira, Ryan Cotterell
Decoding for many NLP tasks requires an effective heuristic algorithm for approximating exact search since the problem of searching the full output space is often intractable, or impractical in many settings.
no code implementations • WS 2020 • Martina Forster, Clara Meister
This paper presents our system for the SIGMORPHON 2020 Shared Task.
no code implementations • ACL 2020 • Clara Meister, Elizabeth Salesky, Ryan Cotterell
Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i. e. over-confident) predictions, a common sign of overfitting.
no code implementations • 22 Apr 2020 • Pinjia He, Clara Meister, Zhendong Su
Machine translation software has seen rapid progress in recent years due to the advancement of deep neural networks.
2 code implementations • 19 Jul 2019 • Pinjia He, Clara Meister, Zhendong Su
Despite its apparent importance, validating the robustness of machine translation systems is very difficult and has, therefore, been much under-explored.