Search Results for author: Alice Li

Found 9 papers, 5 papers with code

On the Effects of Data Scale on Computer Control Agents

no code implementations6 Jun 2024 Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo Campbell-Ajala, Divya Tyamagundlu, Oriana Riva

Moreover, AndroidControl is the most diverse computer control dataset to date, including 15, 283 unique tasks over 833 Android apps, thus allowing us to conduct in-depth analysis of the model performance in and out of the domain of the training data.

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

1 code implementation23 May 2024 Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, Daniel Toyama, Robert Berry, Divya Tyamagundlu, Timothy Lillicrap, Oriana Riva

Finally, we conduct a robustness analysis by testing M3A against a range of task variations on a representative subset of tasks, demonstrating that variations in task parameters can significantly alter the complexity of a task and therefore an agent's performance, highlighting the importance of testing agents under diverse conditions.

Benchmarking

Dissociation of Faithful and Unfaithful Reasoning in LLMs

1 code implementation23 May 2024 Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen

We identify factors that shift LLM recovery behavior: LLMs recover more frequently from obvious errors and in contexts that provide more evidence for the correct answer.

Generative AI Search Engines as Arbiters of Public Knowledge: An Audit of Bias and Authority

no code implementations22 May 2024 Alice Li, Luanne Sinnamon

This paper reports on an audit study of generative AI systems (ChatGPT, Bing Chat, and Perplexity) which investigates how these new search engines construct responses and establish authority for topics of public importance.

Sentiment Analysis

Latent State Estimation Helps UI Agents to Reason

no code implementations17 May 2024 William E Bishop, Alice Li, Christopher Rawles, Oriana Riva

In the context of autonomous UI agents we then show that LLMs used in this manner are more than $76\%$ accurate at inferring various aspects of latent state, such as performed (vs. commanded) actions and task progression.

Android in the Wild: A Large-Scale Dataset for Android Device Control

3 code implementations19 Jul 2023 Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, Timothy Lillicrap

The dataset contains human demonstrations of device interactions, including the screens and actions, and corresponding natural language instructions.

The 7th AI City Challenge

no code implementations15 Apr 2023 Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa

The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential.

Retrieval

Productivity Assessment of Neural Code Completion

1 code implementation13 May 2022 Albert Ziegler, Eirini Kalliamvakou, Shawn Simister, Ganesh Sittampalam, Alice Li, Andrew Rice, Devon Rifkin, Edward Aftandilian

Neural code synthesis has reached a point where snippet generation is accurate enough to be considered for integration into human software development workflows.

Code Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.