1 code implementation • 22 May 2024 • Huanjin Yao, Wenhao Wu, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang
We witness the rise of larger and higher-quality instruction datasets, as well as the involvement of larger-sized LLMs.
1 code implementation • 18 May 2024 • Mengxi Zhang, Wenhao Wu, Yu Lu, Yuxin Song, Kang Rong, Huanjin Yao, Jianbo Zhao, Fanglong Liu, Yifan Sun, Haocheng Feng, Jingdong Wang
To verify our viewpoint, we present the Automated Multi-level Preference (AMP) framework for MLLMs.
2 code implementations • 27 Nov 2023 • Huanjin Yao, Wenhao Wu, Zhiheng Li
In this paper, we present a novel Spatial-Temporal Side Network for memory-efficient fine-tuning large image models to video understanding, named Side4Video.
Ranked #3 on Action Recognition on Something-Something V1
2 code implementations • 27 Nov 2023 • Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang
Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training.