2 code implementations • 5 Dec 2023 • Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot
Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles.
Ranked #1000000000 on Visual Question Answering on MS COCO
no code implementations • 28 Feb 2023 • Chenyu Yi, Siyuan Yang, YuFei Wang, Haoliang Li, Yap-Peng Tan, Alex C. Kot
To exploit information in video with self-supervised learning, TeCo uses global content from video clips and optimizes models for entropy minimization.
1 code implementation • 13 Oct 2021 • Chenyu Yi, Siyuan Yang, Haoliang Li, Yap-Peng Tan, Alex Kot
The state-of-the-art deep neural networks are vulnerable to common corruptions (e. g., input data degradations, distortions, and disturbances caused by weather changes, system error, and processing).