1 code implementation • 9 Apr 2024 • Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen
Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents.
2 code implementations • 9 Apr 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.
Ranked #12 on Visual Question Answering on MM-Vet
1 code implementation • 29 Mar 2024 • Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
We evaluate 16 leading LVLMs on MMStar to assess their multi-modal capabilities, and on 7 benchmarks with the proposed metrics to investigate their data leakage and actual multi-modal gain.
1 code implementation • 26 Mar 2024 • Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).
Ranked #5 on Long-Context Understanding on Ada-LEval (BestAnswer)
1 code implementation • 29 Jan 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension.
Ranked #17 on Visual Question Answering on MM-Vet
1 code implementation • 20 Oct 2023 • Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen
In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability.
1 code implementation • 26 Sep 2023 • Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.
Ranked #9 on Visual Question Answering (VQA) on InfiMM-Eval
no code implementations • 20 Sep 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
3 code implementations • 12 Jul 2023 • YuAn Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin
In response to these challenges, we propose MMBench, a novel multi-modality benchmark.
Ranked #1 on Visual Question Answering on MMBench
1 code implementation • 17 Feb 2023 • Yujie Zhou, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang
Specifically, we construct a negative-sample-free triplet steam structure that is composed of an anchor stream without any masking, a spatial masking stream with Central Spatial Masking (CSM), and a temporal masking stream with Motion Attention Temporal Masking (MATM).
no code implementations • ICCV 2023 • Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in the wild.
Ranked #2 on Human Interaction Recognition on NTU RGB+D
3 code implementations • 12 Oct 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin
Graph convolution networks (GCN) have been widely used in skeleton-based action recognition.
Ranked #7 on Skeleton Based Action Recognition on NTU RGB+D
1 code implementation • 20 Sep 2022 • Haodong Duan, Yue Zhao, Kai Chen, Yuanjun Xiong, Dahua Lin
Deep learning models have achieved excellent recognition results on large-scale video benchmarks.
1 code implementation • 19 May 2022 • Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin
The toolbox supports a wide variety of skeleton action recognition algorithms, including approaches based on GCN and CNN.
Ranked #19 on Skeleton Based Action Recognition on NTU RGB+D 120
1 code implementation • CVPR 2022 • Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin
To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation.
1 code implementation • arXiv 2022 • Haodong Duan
A placeholder for the tech report.
1 code implementation • CVPR 2022 • Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, LiMin Wang
Recent works prefer to formulate frame sampling as a sequential decision task by selecting frames one by one according to their importance, while we present a new paradigm of learning instance-specific video condensation policies to select informative frames for representing the entire video only in a single step.
4 code implementations • CVPR 2022 • Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai
In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons.
Ranked #1 on Action Recognition on NTU RGB+D 120
3 code implementations • ECCV 2020 • Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin
Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.
Ranked #5 on Action Recognition on UCF101 (using extra training data)
2 code implementations • ICCV 2019 • Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang
In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.
no code implementations • 16 Dec 2017 • Bingzhe Wu, Haodong Duan, Zhichao Liu, Guangyu Sun
In this paper, we build a super resolution perceptual generative adversarial network (SRPGAN) framework for SISR tasks.