no code implementations • 10 Oct 2023 • Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models.
Ranked #2 on Bug fixing on SWE-bench
1 code implementation • 24 May 2023 • Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik Narasimhan
Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding.
1 code implementation • 24 Feb 2023 • Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran, Mingqiu Wang, Yuan Cao, Karthik Narasimhan
The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies.
1 code implementation • ACL 2022 • Carlos E. Jimenez, Olga Russakovsky, Karthik Narasimhan
We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests.
1 code implementation • 18 Feb 2022 • Vishvak Murahari, Carlos E. Jimenez, Runzhe Yang, Karthik Narasimhan
In this paper, we introduce data multiplexing (DataMUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation.