Search Results for author: Tongshuang Wu

Found 37 papers, 20 papers with code

It is AI’s Turn to Ask Humans a Question: Question-Answer Pair Generation for Children’s Story Books

no code implementations • ACL 2022 • Bingsheng Yao, Dakuo Wang, Tongshuang Wu, Zheng Zhang, Toby Li, Mo Yu, Ying Xu

Existing question answering (QA) techniques are created mainly to answer questions asked by humans.

Answer Generation Question-Answer-Generation +1

Paper
Add Code

Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension

no code implementations • ACL 2022 • Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Li, Nora Bradford, Branda Sun, Tran Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Yu, Mark Warschauer

Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.

Benchmarking Question Answering +2

Paper
Add Code

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

no code implementations • 4 May 2024 • Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu

In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents?

Paper
Add Code

Better Synthetic Data by Retrieving and Transforming Existing Datasets

1 code implementation • 22 Apr 2024 • Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, Graham Neubig

Recent work has studied prompt-driven synthetic data generation using large language models, but these generated datasets tend to lack complexity and diversity.

Synthetic Data Generation

1,892

Paper
Code

Evaluating Mathematical Reasoning Beyond Accuracy

2 code implementations • 8 Apr 2024 • Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, PengFei Liu

To measure reasoning beyond final-answer accuracy, we introduce ReasonEval, a new methodology for evaluating the quality of reasoning steps.

Math Mathematical Reasoning

205

Paper
Code

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

no code implementations • 27 Feb 2024 • Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen

For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance.

Common Sense Reasoning Question Answering

Paper
Add Code

Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia

1 code implementation • 21 Feb 2024 • Tzu-Sheng Kuo, Aaron Halfaker, Zirui Cheng, Jiwoo Kim, Meng-Hsin Wu, Tongshuang Wu, Kenneth Holstein, Haiyi Zhu

AI tools are increasingly deployed in community contexts.

Paper
Code

Measuring Adversarial Datasets

no code implementations • 6 Nov 2023 • Yuanchen Bai, Raoyi Huang, Vijay Viswanathan, Tzu-Sheng Kuo, Tongshuang Wu

In the era of widespread public use of AI systems across various domains, ensuring adversarial robustness has become increasingly vital to maintain safety and prevent undesirable errors.

Adversarial Robustness

Paper
Add Code

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

1 code implementation • 25 Oct 2023 • Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners.

146

Paper
Code

Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs

no code implementations • 14 Oct 2023 • Chenyang Yang, Rishabh Rustogi, Rachel Brower-Sinning, Grace A. Lewis, Christian Kästner, Tongshuang Wu

Current model testing work has mostly focused on creating test cases.

Stance Detection

Paper
Add Code

From Nuisance to News Sense: Augmenting the News with Cross-Document Evidence and Context

1 code implementation • 6 Oct 2023 • Jeremiah Milbauer, Ziqi Ding, Zhijin Wu, Tongshuang Wu

Reading and understanding the stories in the news is increasingly difficult.

Fact Verification Misinformation

Paper
Code

Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models

no code implementations • 3 Oct 2023 • Michael Xieyang Liu, Tongshuang Wu, Tianying Chen, Franklin Mingzhe Li, Aniket Kittur, Brad A. Myers

Sensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect to various criteria.

Decision Making Navigate

Paper
Add Code

Prompt2Model: Generating Deployable Models from Natural Language Instructions

1 code implementation • 23 Aug 2023 • Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, Graham Neubig

In this paper, we propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs, and uses it to train a special-purpose model that is conducive to deployment.

Retrieval

1,892

Paper
Code

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

no code implementations • 19 Jul 2023 • Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T. Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang

We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets.

Paper
Add Code

Large Language Models Enable Few-Shot Clustering

1 code implementation • 2 Jul 2023 • Vijay Viswanathan, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu, Graham Neubig

In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering.

Clustering Language Modelling +2

Paper
Code

Is AI the better programming partner? Human-Human Pair Programming vs. Human-AI pAIr Programming

no code implementations • 8 Jun 2023 • Qianou Ma, Tongshuang Wu, Kenneth Koedinger

The emergence of large-language models (LLMs) that excel at code generation and commercial products such as GitHub's Copilot has sparked interest in human-AI pair programming (referred to as "pAIr programming") where an AI system collaborates with a human programmer.

Code Generation

Paper
Add Code

Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial Uses

no code implementations • 30 May 2023 • Logan Stapleton, Jordan Taylor, Sarah Fox, Tongshuang Wu, Haiyi Zhu

Finally, we discuss how our use cases demonstrate green teaming as both a practical design method and a mode of critique, which problematizes and subverts current understandings of harms and values in generative AI.

Paper
Add Code

DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions

1 code implementation • 26 May 2023 • Vijay Viswanathan, Luyu Gao, Tongshuang Wu, PengFei Liu, Graham Neubig

Using this data, we compare various information retrieval algorithms on our test set and present a superior bi-encoder retriever for text-based dataset recommendation.

Information Retrieval Retrieval

Paper
Code

BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases

no code implementations • 23 May 2023 • Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, Maarten Sap

Toxicity annotators and content moderators often default to mental shortcuts when making decisions.

Paper
Add Code

ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing

1 code implementation • 16 May 2023 • Hua Shen, Chieh-Yang Huang, Tongshuang Wu, Ting-Hao 'Kenneth' Huang

The paper further discusses the practical human usage patterns in interacting with ConvXAI for scientific co-writing.

Explainable Artificial Intelligence (XAI)

Paper
Code

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

no code implementations • 1 May 2023 • Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins

Many recent advances in natural language generation have been fueled by training large language models on internet-scale data.

Text Generation

Paper
Add Code

Tool Learning with Foundation Models

3 code implementations • 17 Apr 2023 • Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun

Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools.

4,441

Paper
Code

Parachute: Evaluating Interactive Human-LM Co-writing Systems

no code implementations • 11 Mar 2023 • Hua Shen, Tongshuang Wu

A surge of advances in language models (LMs) has led to significant interest in using LMs to build co-writing systems, in which humans and LMs interactively contribute to a shared writing artifact.

Paper
Add Code

ScatterShot: Interactive In-context Example Curation for Text Transformation

1 code implementation • 14 Feb 2023 • Tongshuang Wu, Hua Shen, Daniel S. Weld, Jeffrey Heer, Marco Tulio Ribeiro

ScatterShot iteratively slices unlabeled data into task-specific patterns, samples informative inputs from underexplored or not-yet-saturated slices in an active learning manner, and helps users label more efficiently with the help of an LLM and the current example set.

Active Learning In-Context Learning

Paper
Code

Capabilities for Better ML Engineering

no code implementations • 11 Nov 2022 • Chenyang Yang, Rachel Brower-Sinning, Grace A. Lewis, Christian Kästner, Tongshuang Wu

In spite of machine learning's rapid growth, its engineering support is scattered in many forms, and tends to favor certain engineering stages, stakeholders, and evaluation preferences.

Paper
Add Code

Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension

1 code implementation • 26 Mar 2022 • Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Jia-Jun Li, Nora Bradford, Branda Sun, Tran Bao Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Yu, Mark Warschauer

Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.

Ranked #1 on Question Generation on FairytaleQA

Benchmarking Question Answering +2

Paper
Code

Are Shortest Rationales the Best Explanations for Human Understanding?

1 code implementation • ACL 2022 • Hua Shen, Tongshuang Wu, Wenbo Guo, Ting-Hao 'Kenneth' Huang

Existing self-explaining models typically favor extracting the shortest possible rationales - snippets of an input text "responsible for" corresponding output - to explain the model prediction, with the assumption that shorter rationales are more intuitive to humans.

Paper
Code

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement

1 code implementation • 13 Feb 2022 • Zheng Zhang, Ying Xu, Yanhao Wang, Bingsheng Yao, Daniel Ritchie, Tongshuang Wu, Mo Yu, Dakuo Wang, Toby Jia-Jun Li

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges in coming up with appropriate questions.

Chatbot

Paper
Code

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation

759

Paper
Code

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

no code implementations • 4 Oct 2021 • Tongshuang Wu, Michael Terry, Carrie J. Cai

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks.

Language Modelling Large Language Model

Paper
Add Code

It is AI's Turn to Ask Humans a Question: Question-Answer Pair Generation for Children's Story Books

2 code implementations • 8 Sep 2021 • Bingsheng Yao, Dakuo Wang, Tongshuang Wu, Zheng Zhang, Toby Jia-Jun Li, Mo Yu, Ying Xu

Existing question answering (QA) techniques are created mainly to answer questions asked by humans.

Answer Generation Data Augmentation +3

Paper
Code

DeHumor: Visual Analytics for Decomposing Humor

no code implementations • 18 Jul 2021 • Xingbo Wang, Yao Ming, Tongshuang Wu, Haipeng Zeng, Yong Wang, Huamin Qu

Despite being a critical communication skill, grasping humor is challenging -- a successful use of humor requires a mixture of both engaging content build-up and an appropriate vocal delivery (e. g., pause).

Paper
Add Code

Tailor: Generating and Perturbing Text with Semantic Controls

1 code implementation • ACL 2022 • Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner

We craft a set of operations to modify the control codes, which in turn steer generation towards targeted attributes.

Data Augmentation Style Transfer +1

Paper
Code

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models

1 code implementation • ACL 2021 • Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel S. Weld

While counterfactual examples are useful for analysis and training of NLP models, current generation methods either rely on manual labor to create very few counterfactuals, or only instantiate limited types of perturbations such as paraphrases or word substitutions.

counterfactual Text Generation

Paper
Code

Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance

no code implementations • 26 Jun 2020 • Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, Daniel S. Weld

However, prior studies observed improvements from explanations only when the AI, alone, outperformed both the human and the best team.

Decision Making Question Answering +1

Paper
Add Code

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

4 code implementations • ACL 2020 • Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors.

Question Answering Sentiment Analysis

1,986

Paper
Code

Errudite: Scalable, Reproducible, and Testable Error Analysis

1 code implementation • ACL 2019 • Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel Weld

Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions.

counterfactual

104

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.