Text Compression
9 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Text Compression
Most implemented papers
LLMZip: Lossless Text Compression using Large Language Models
We provide new estimates of an asymptotic upper bound on the entropy of English using the large language model LLaMA-7B as a predictor for the next token given a window of past tokens.
Syntactically Informed Text Compression with Recurrent Neural Networks
We present a self-contained system for constructing natural language models for use in text compression.
Authorship Verification based on Compression-Models
Instead, the only three key components of our method are a compressing algorithm, a dissimilarity measure and a threshold, needed to accept or reject the authorship of the questioned document.
A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models
Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary.
Data-efficient Neural Text Compression with Interactive Learning
Neural sequence-to-sequence models have been successfully applied to text compression.
Contextualized Semantic Distance between Highly Overlapped Texts
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
Gzip versus bag-of-words for text classification
The effectiveness of compression in text classification ('gzip') has recently garnered lots of attention.
LLMs may Dominate Information Access: Neural Retrievers are Biased Towards LLM-Generated Texts
We refer to this category of biases in neural retrieval models towards the LLM-generated text as the \textbf{source bias}.
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective.