no code implementations • 2 Mar 2024 • Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes
Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild.
1 code implementation • 6 Dec 2023 • Assaf Ben-Kish, Moran Yanuka, Morris Alper, Raja Giryes, Hadar Averbuch-Elor
To this end, we propose a framework for addressing hallucinations in image captioning in the open-vocabulary setting.