no code implementations • 15 Apr 2024 • Arjun Panickssery, Samuel R. Bowman, Shi Feng
Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement.
1 code implementation • 11 Jan 2024 • Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik Kauffman, Hans Gundlach, Irina Gritsevskaya, Joe Cavanagh, Jonathan Chiang, Lydia La Roux, Michelle Hung
We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles.
Ranked #1 on Multimodal Reasoning on REBUS