no code implementations • ICCV 2023 • Jiashuo Fan, Yaoyuan Liang, Leyao Liu, ShaoLun Huang, Lei Zhang
We evaluate our approach on two datasets and show that our proposed RCA-NOC approach outperforms state-of-the-art methods by a large margin, demonstrating its effectiveness in improving vision-language representation for novel object captioning.
1 code implementation • 26 Oct 2023 • Xiao Liang, Tao Shi, Yaoyuan Liang, Te Tao, Shao-Lun Huang
In this paper, we propose DiffusionVG, a novel framework with diffusion models that formulates video grounding as a conditional generation task, where the target span is generated from Gaussian noise inputs and interatively refined in the reverse diffusion process.
1 code implementation • 25 Oct 2023 • Tao Shi, Xiao Liang, Yaoyuan Liang, Xinyi Tong, Shao-Lun Huang
To address these challenges, we propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL), which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions.
1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang
As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.
Ranked #7 on Referring Expression Comprehension on RefCOCO