no code implementations • 4 Feb 2024 • Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang
In this work, we propose a novel approach to densely ground visual entities from a long caption.
1 code implementation • 28 Nov 2023 • Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang
Language has emerged as a natural interface for image editing.
no code implementations • 3 Nov 2021 • Yi-Wen Chen, Xiaojie Jin, Xiaohui Shen, Ming-Hsuan Yang
Video salient object detection aims to find the most visually distinctive objects in a video.
1 code implementation • NeurIPS 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang
Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
1 code implementation • 20 Apr 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang
While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.
1 code implementation • 13 Apr 2020 • Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang
With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.
1 code implementation • 10 Oct 2019 • Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang
To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.
Ranked #19 on Referring Expression Segmentation on RefCOCO testB
no code implementations • 8 Jan 2019 • Yi-Wen Chen, Yi-Hsuan Tsai, Chu-Ya Yang, Yen-Yu Lin, Ming-Hsuan Yang
The entire process is decomposed into two tasks: 1) solving a submodular function for selecting object-like segments, and 2) learning a CNN model with a transferable module for adapting seen categories in the source domain to the unseen target video.