1 code implementation • 11 Mar 2024 • Alberto Testoni, Juell Sprott, Sandro Pezzelle
While human speakers use a variety of different expressions when describing the same object in an image, giving rise to a distribution of plausible labels driven by pragmatic constraints, the extent to which current Vision & Language Large Language Models (VLLMs) can mimic this crucial feature of language use is an open question.