Long-Context Understanding
13 papers with code • 2 benchmarks • 0 datasets
Most implemented papers
GLM-130B: An Open Bilingual Pre-trained Model
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.
GPT-4 Technical Report
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.
FABLES: Evaluating faithfulness and content selection in book-length summarization
While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
The rapid development of Large Language Models (LLMs) has led to great strides in model capabilities like long-context understanding and reasoning.
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding.
LooGLE: Can Long-Context Language Models Understand Long Contexts?
In this paper, we present LooGLE, a Long Context Generic Language Evaluation benchmark for LLMs' long context understanding.
InternLM2 Technical Report
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).
Long-context LLMs Struggle with Long In-context Learning
Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs.
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents.