2 code implementations • 28 Nov 2023 • Marina Zhang, Owen Vallis, Aysegul Bumin, Tanay Vakharia, Elie Bursztein
This paper introduces RETSim (Resilient and Efficient Text Similarity), a lightweight, multilingual deep learning model trained to produce robust metric embeddings for near-duplicate text retrieval, clustering, and dataset deduplication tasks.
1 code implementation • NeurIPS 2023 • Elie Bursztein, Marina Zhang, Owen Vallis, Xinyu Jia, Alexey Kurakin
The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks.