1 code implementation • NAACL (DADC) 2022 • Margaret Li, Julian Michael
Adversarial data collection has shown promise as a method for building models which are more robust to the spurious correlations that generally appear in naturalistic data.
1 code implementation • 2 May 2024 • Zhijing Jin, Yuen Chen, Fernando Gonzalez, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Schölkopf, Mona Diab
We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction.
1 code implementation • 8 Mar 2024 • James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin
Moreover, this model generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%.
no code implementations • 1 Dec 2023 • Julian Michael
I propose a paradigm for scientific progress in NLP centered around developing scalable, data-driven theories of linguistic structure.
1 code implementation • 20 Nov 2023 • David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
1 code implementation • 15 Nov 2023 • Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman
Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%.
1 code implementation • NeurIPS 2023 • Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman
We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs--e. g., by reordering the multiple-choice options in a few-shot prompt to make the answer always "(A)"--which models systematically fail to mention in their explanations.
1 code implementation • 27 Apr 2023 • Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.
no code implementations • 26 Aug 2022 • Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman
We present the results of the NLP Community Metasurvey.
1 code implementation • 27 May 2022 • Weijia Shi, Julian Michael, Suchin Gururangan, Luke Zettlemoyer
Retrieval-augmented language models (LMs) use non-parametric memory to substantially outperform their non-retrieval counterparts on perplexity-based evaluations, but it is an open question whether they achieve similar gains in few- and zero-shot end-task accuracy.
1 code implementation • EMNLP 2021 • Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan
We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage.
no code implementations • Findings (ACL) 2021 • Bhargavi Paranjape, Julian Michael, Marjan Ghazvininejad, Luke Zettlemoyer, Hannaneh Hajishirzi
Many commonsense reasoning NLP tasks involve choosing between one or more possible answers to a question or prompt based on knowledge that is often implicit.
no code implementations • EMNLP 2020 • Julian Michael, Jan A. Botha, Ian Tenney
The success of pretrained contextual encoders, such as ELMo and BERT, has brought a great deal of interest in what these models learn: do they, without explicit supervision, learn to encode meaningful notions of linguistic structure?
2 code implementations • EMNLP 2020 • Sewon Min, Julian Michael, Hannaneh Hajishirzi, Luke Zettlemoyer
Ambiguity is inherent to open-domain question answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer.
1 code implementation • ACL 2020 • Paul Roit, Ayal Klein, Daniela Stepanov, Jonathan Mamou, Julian Michael, Gabriel Stanovsky, Luke Zettlemoyer, Ido Dagan
Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen.
6 code implementations • NeurIPS 2019 • Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks.
no code implementations • NAACL 2018 • Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, Ido Dagan
We present data and methods that enable a supervised learning approach to Open Information Extraction (Open IE).
3 code implementations • ACL 2018 • Nicholas FitzGerald, Julian Michael, Luheng He, Luke Zettlemoyer
We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser.
11 code implementations • WS 2018 • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset.
Ranked #46 on Natural Language Inference on MultiNLI
Natural Language Inference Natural Language Understanding +2
1 code implementation • NAACL 2018 • Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, Luke Zettlemoyer
We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs.