no code implementations • 15 Mar 2024 • Tian Meng, Yang Tao, Ruilin Lyu, Wuliang Yin
By enabling a VLM to interact with off-the-shelf vision models as tools, the proposed method is capable of classifying and segmenting target objects using only image-level labels.