Search Results for author: Zhimin Li

Found 11 papers, 2 papers with code

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

1 code implementation • 27 May 2024 • Guan Wang, Zhimin Li, Qingchao Chen, Yang Liu

Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos.

Paper
Code

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

1 code implementation • 14 May 2024 • Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, Jianchen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Mingtao Chen, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu

For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.

Image Generation Language Modelling +2

2,009

Paper
Code

AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making

no code implementations • 7 Dec 2023 • Shusen Liu, Haichao Miao, Zhimin Li, Matthew Olson, Valerio Pascucci, Peer-Timo Bremer

With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization.

Decision Making

Paper
Add Code

Instance-wise Linearization of Neural Network for Model Interpretation

no code implementations • 25 Oct 2023 • Zhimin Li, Shusen Liu, Kailkhura Bhavya, Timo Bremer, Valerio Pascucci

For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model.

Dimensionality Reduction

Paper
Add Code

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

no code implementations • 9 Dec 2022 • Jie Jiang, Zhimin Li, Jiangfeng Xiong, Rongwei Quan, Qinglin Lu, Wei Liu

Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities.

Multi-Label Classification Scene Segmentation +3

Paper
Add Code

"Understanding Robustness Lottery": A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches

no code implementations • 16 Jun 2022 • Zhimin Li, Shusen Liu, Xin Yu, Kailkhura Bhavya, Jie Cao, Diffenderfer James Daniel, Peer-Timo Bremer, Valerio Pascucci

We decomposed and evaluated a set of critical geometric concepts from the common adopted classification loss, and used them to design a visualization system to compare and highlight the impact of pruning on model performance and feature representation.

Network Pruning

Paper
Add Code

Category-Aware Transformer Network for Better Human-Object Interaction Detection

no code implementations • CVPR 2022 • Leizhen Dong, Zhimin Li, Kunlun Xu, Zhijun Zhang, Luxin Yan, Sheng Zhong, Xu Zou

Specifically, the Object Query would be initialized via category priors represented by an external object detection model to yield better performance.

Human-Object Interaction Detection Object +2

Paper
Add Code

Effective Actor-centric Human-object Interaction Detection

no code implementations • 24 Feb 2022 • Kunlun Xu, Zhimin Li, Zhijun Zhang, Leizhen Dong, Wenhui Xu, Luxin Yan, Sheng Zhong, Xu Zou

Moreover, we also use an actor branch to get interaction prediction of the actor and propose a novel composition strategy based on center-point indexing to generate the final HOI prediction.

Human-Object Interaction Detection Object

Paper
Add Code

Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

no code implementations • 14 Dec 2021 • Zhimin Li, Cheng Zou, Yu Zhao, Boxun Li, Sheng Zhong

Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding.

Human-Object Interaction Detection Scene Understanding

Paper
Add Code

Overview of Tencent Multi-modal Ads Video Understanding Challenge

no code implementations • 16 Sep 2021 • Zhenzhi Wang, Liyu Wu, Zhimin Li, Jiangfeng Xiong, Qinglin Lu

Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification.

Multi-Label Classification Video Classification +1

Paper
Add Code

Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension

no code implementations • EMNLP 2018 • Shusen Liu, Tao Li, Zhimin Li, Vivek Srikumar, Valerio Pascucci, Peer-Timo Bremer

Neural networks models have gained unprecedented popularity in natural language processing due to their state-of-the-art performance and the flexible end-to-end training scheme.

Decision Making Natural Language Inference +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.