Search Results for author: Arjun Panickssery

Found 2 papers, 1 papers with code

LLM Evaluators Recognize and Favor Their Own Generations

no code implementations • 15 Apr 2024 • Arjun Panickssery, Samuel R. Bowman, Shi Feng

Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement.

Benchmarking

Paper
Add Code

REBUS: A Robust Evaluation Benchmark of Understanding Symbols

1 code implementation • 11 Jan 2024 • Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik Kauffman, Hans Gundlach, Irina Gritsevskaya, Joe Cavanagh, Jonathan Chiang, Lydia La Roux, Michelle Hung

We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles.

Ranked #1 on Multimodal Reasoning on REBUS

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.