1 code implementation • 21 Feb 2024 • Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun
Notably, the best-performing model, GPT-4V, attains an average score of 17. 23% on OlympiadBench, with a mere 11. 28% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.
1 code implementation • 21 Feb 2024 • Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, JunHao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun
Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction.