1 code implementation • 16 Apr 2024 • Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu sun
Diffusion models have exhibited remarkable capabilities in text-to-image generation.
1 code implementation • 28 Mar 2024 • Sishuo Chen, Lei LI, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu sun, Lu Hou
Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries.
1 code implementation • 1 Mar 2024 • Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei LI, Sishuo Chen, Xu sun, Lu Hou
Motivated by these two problems, we propose the \textbf{TempCompass} benchmark, which introduces a diversity of temporal aspects and task formats.
1 code implementation • 21 Feb 2024 • Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Xiangdi Meng, Tianyu Liu, Baobao Chang
To address this, we introduce Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal embodied environments.
1 code implementation • 4 Dec 2023 • Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou
This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.
1 code implementation • 29 Nov 2023 • Shicheng Li, Lei LI, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu sun, Lu Hou
The ability to perceive how objects change over time is a crucial ingredient in human intelligence.
1 code implementation • NeurIPS 2023 • Yuanxin Liu, Lei LI, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Chen, Xu sun, Lu Hou
The multi-aspect categorization of FETV enables fine-grained analysis of the metrics' reliability in different scenarios.
1 code implementation • 29 Oct 2023 • Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu sun, Lu Hou
TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding.
Ranked #1 on Video Retrieval on Condensed Movies (using extra training data)
1 code implementation • 3 Oct 2023 • Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Tianyu Liu, Baobao Chang
In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents.
no code implementations • 7 Jun 2023 • Lei LI, Yuwei Yin, Shicheng Li, Liang Chen, Peiyi Wang, Shuhuai Ren, Mukai Li, Yazheng Yang, Jingjing Xu, Xu sun, Lingpeng Kong, Qi Liu
To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M$^3$IT) dataset, designed to optimize VLM alignment with human instructions.
1 code implementation • NeurIPS 2023 • Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu sun
This work proposes POMP, a prompt pre-training method for vision-language models.
1 code implementation • 4 Jun 2022 • Shuhuai Ren, Lei LI, Xuancheng Ren, Guangxiang Zhao, Xu sun
However, evaluating the openness of CLIP-like models is challenging, as the models are open to arbitrary vocabulary in theory, but their accuracy varies in practice.
no code implementations • 27 Dec 2021 • Yuan YAO, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, Fanchao Qi, Junwei Bao, Jinran Nie, Zheni Zeng, Yuxian Gu, Kun Zhou, Xuancheng Huang, Wenhao Li, Shuhuai Ren, Jinliang Lu, Chengqiang Xu, Huadong Wang, Guoyang Zeng, Zile Zhou, Jiajun Zhang, Juanzi Li, Minlie Huang, Rui Yan, Xiaodong He, Xiaojun Wan, Xin Zhao, Xu sun, Yang Liu, Zhiyuan Liu, Xianpei Han, Erhong Yang, Zhifang Sui, Maosong Sun
We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic.
1 code implementation • EMNLP 2021 • Lei LI, Yankai Lin, Shuhuai Ren, Peng Li, Jie zhou, Xu sun
Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-trained language models.
1 code implementation • EMNLP 2021 • Shuhuai Ren, Jinchao Zhang, Lei LI, Xu sun, Jie zhou
Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-resource or class-imbalanced situations.
1 code implementation • ACL 2021 • Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu sun, Hongxia Yang
To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions.
Ranked #4 on Image-to-Text Retrieval on MS COCO
1 code implementation • Findings (EMNLP) 2021 • Lei LI, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie zhou, Xu sun
On the other hand, the exiting decisions made by internal classifiers are unreliable, leading to wrongly emitted early predictions.
no code implementations • 7 Nov 2019 • Zhihan Zhang, Zhiyi Yin, Shuhuai Ren, Xinhang Li, Shicheng Li
In this paper, we aim to collect diversified information from video and text for informative comment generation.
1 code implementation • ACL 2019 • Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che
Experiments on three popular datasets using convolutional as well as LSTM models show that PWWS reduces the classification accuracy to the most extent, and keeps a very low word substitution rate.