no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan
In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.
2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan
In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.
Ranked #58 on Visual Question Answering on MM-Vet
1 code implementation • 20 Dec 2023 • Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.
1 code implementation • 27 Nov 2023 • Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan
Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.
4 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan
In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.
Ranked #4 on Zero-Shot Video Question Answer on TGIF-QA
4 code implementations • 3 Oct 2023 • Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, Hongfa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan
We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.
Ranked #1 on Zero-shot Audio Classification on VGG-Sound (using extra training data)
no code implementations • 24 May 2023 • Dongxu Yue, Qin Guo, Munan Ning, Jiaxi Cui, Yuesheng Zhu, Li Yuan
Despite the successful image reconstruction achieved by diffusion-based methods, there are still challenges in effectively manipulating fine-gained facial attributes with textual instructions. To address these issues and facilitate convenient manipulation of real facial images, we propose a novel approach that conduct text-driven image editing in the semantic latent space of diffusion model.
no code implementations • 23 May 2023 • Haonan Qiu, Zeyin Song, Yanqi Chen, Munan Ning, Wei Fang, Tao Sun, Zhengyu Ma, Li Yuan, Yonghong Tian
However, in this work, we find the method above is not ideal for the SNNs training as it omits the temporal dynamics of SNNs and degrades the performance quickly with the decrease of inference time steps.
no code implementations • 22 May 2023 • Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan
One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story.
1 code implementation • 18 Jan 2023 • Munan Ning, Donghuan Lu, Yujia Xie, Dongdong Chen, Dong Wei, Yefeng Zheng, Yonghong Tian, Shuicheng Yan, Li Yuan
Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data.
1 code implementation • 18 Jul 2022 • Xinyu Shi, Dong Wei, Yu Zhang, Donghuan Lu, Munan Ning, Jiashun Chen, Kai Ma, Yefeng Zheng
A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images.
Ranked #4 on Few-Shot Semantic Segmentation on COCO-20i (1-shot)
1 code implementation • 7 Jul 2022 • Jiashun Chen, Donghuan Lu, Yu Zhang, Dong Wei, Munan Ning, Xinyu Shi, Zhe Xu, Yefeng Zheng
In this study, we propose a novel Deformer module along with a multi-scale framework for the deformable image registration task.
2 code implementations • ICCV 2021 • Munan Ning, Donghuan Lu, Dong Wei, Cheng Bian, Chenglang Yuan, Shuang Yu, Kai Ma, Yefeng Zheng
Unsupervised domain adaption has proven to be an effective approach for alleviating the intensive workload of manual annotation by aligning the synthetic source-domain data and the real-world target-domain samples.
no code implementations • 18 Aug 2021 • Munan Ning, Cheng Bian, Dong Wei, Chenglang Yuan, Yaohua Wang, Yang Guo, Kai Ma, Yefeng Zheng
Domain shift happens in cross-domain scenarios commonly because of the wide gaps between different domains: when applying a deep learning model well-trained in one domain to another target domain, the model usually performs poorly.
no code implementations • 29 Dec 2020 • Munan Ning, Cheng Bian, Chenglang Yuan, Kai Ma, Yefeng Zheng
However, due to the visual and anatomical differences between different modalities, the accurate segmentation of brain structures becomes challenging.
no code implementations • 20 Jul 2020 • Munan Ning, Cheng Bian, Donghuan Lu, Hong-Yu Zhou, Shuang Yu, Chenglang Yuan, Yang Guo, Yaohua Wang, Kai Ma, Yefeng Zheng
Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness among Asian people.