1 code implementation • 27 May 2024 • Guan Wang, Zhimin Li, Qingchao Chen, Yang Liu
Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos.
1 code implementation • 14 May 2024 • Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, Jianchen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Mingtao Chen, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu
For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.
no code implementations • 7 Dec 2023 • Shusen Liu, Haichao Miao, Zhimin Li, Matthew Olson, Valerio Pascucci, Peer-Timo Bremer
With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization.
no code implementations • 25 Oct 2023 • Zhimin Li, Shusen Liu, Kailkhura Bhavya, Timo Bremer, Valerio Pascucci
For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model.
no code implementations • 9 Dec 2022 • Jie Jiang, Zhimin Li, Jiangfeng Xiong, Rongwei Quan, Qinglin Lu, Wei Liu
Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities.
no code implementations • 16 Jun 2022 • Zhimin Li, Shusen Liu, Xin Yu, Kailkhura Bhavya, Jie Cao, Diffenderfer James Daniel, Peer-Timo Bremer, Valerio Pascucci
We decomposed and evaluated a set of critical geometric concepts from the common adopted classification loss, and used them to design a visualization system to compare and highlight the impact of pruning on model performance and feature representation.
no code implementations • CVPR 2022 • Leizhen Dong, Zhimin Li, Kunlun Xu, Zhijun Zhang, Luxin Yan, Sheng Zhong, Xu Zou
Specifically, the Object Query would be initialized via category priors represented by an external object detection model to yield better performance.
no code implementations • 24 Feb 2022 • Kunlun Xu, Zhimin Li, Zhijun Zhang, Leizhen Dong, Wenhui Xu, Luxin Yan, Sheng Zhong, Xu Zou
Moreover, we also use an actor branch to get interaction prediction of the actor and propose a novel composition strategy based on center-point indexing to generate the final HOI prediction.
no code implementations • 14 Dec 2021 • Zhimin Li, Cheng Zou, Yu Zhao, Boxun Li, Sheng Zhong
Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding.
no code implementations • 16 Sep 2021 • Zhenzhi Wang, Liyu Wu, Zhimin Li, Jiangfeng Xiong, Qinglin Lu
Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification.
no code implementations • EMNLP 2018 • Shusen Liu, Tao Li, Zhimin Li, Vivek Srikumar, Valerio Pascucci, Peer-Timo Bremer
Neural networks models have gained unprecedented popularity in natural language processing due to their state-of-the-art performance and the flexible end-to-end training scheme.