Search Results for author: Munan Ning

Found 16 papers, 9 papers with code

LLMBind: A Unified Modality-Task Integration Framework

no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Audio Generation Image Segmentation +3

Paper
Add Code

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan

In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.

Ranked #58 on Visual Question Answering on MM-Vet

Hallucination Visual Question Answering

2,450

Paper
Code

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

1 code implementation • 20 Dec 2023 • Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.

3D Generation Image to 3D

252

Paper
Code

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

1 code implementation • 27 Nov 2023 • Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

Paper
Code

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

4 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Ranked #4 on Zero-Shot Video Question Answer on TGIF-QA

Language Modelling Large Language Model +2

2,450

Paper
Code

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

4 code implementations • 3 Oct 2023 • Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, Hongfa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

Ranked #1 on Zero-shot Audio Classification on VGG-Sound (using extra training data)

Audio Classification Contrastive Learning +11

2,450

Paper
Code

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

no code implementations • 24 May 2023 • Dongxu Yue, Qin Guo, Munan Ning, Jiaxi Cui, Yuesheng Zhu, Li Yuan

Despite the successful image reconstruction achieved by diffusion-based methods, there are still challenges in effectively manipulating fine-gained facial attributes with textual instructions. To address these issues and facilitate convenient manipulation of real facial images, we propose a novel approach that conduct text-driven image editing in the semantic latent space of diffusion model.

Attribute Image Reconstruction

Paper
Add Code

Temporal Contrastive Learning for Spiking Neural Networks

no code implementations • 23 May 2023 • Haonan Qiu, Zeyin Song, Yanqi Chen, Munan Ning, Wei Fang, Tao Sun, Zhengyu Ma, Li Yuan, Yonghong Tian

However, in this work, we find the method above is not ideal for the SNNs training as it omits the temporal dynamics of SNNs and degrades the performance quickly with the decrease of inference time steps.

Contrastive Learning

Paper
Add Code

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

no code implementations • 22 May 2023 • Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan

One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story.

Paper
Add Code

MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation

1 code implementation • 18 Jan 2023 • Munan Ning, Donghuan Lu, Yujia Xie, Dongdong Chen, Dong Wei, Yefeng Zheng, Yonghong Tian, Shuicheng Yan, Li Yuan

Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data.

Domain Adaptation Semantic Segmentation +1

Paper
Code

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

1 code implementation • 18 Jul 2022 • Xinyu Shi, Dong Wei, Yu Zhang, Donghuan Lu, Munan Ning, Jiashun Chen, Kai Ma, Yefeng Zheng

A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images.

Ranked #4 on Few-Shot Semantic Segmentation on COCO-20i (1-shot)

Few-Shot Semantic Segmentation Segmentation +1

Paper
Code

Deformer: Towards Displacement Field Learning for Unsupervised Medical Image Registration

1 code implementation • 7 Jul 2022 • Jiashun Chen, Donghuan Lu, Yu Zhang, Dong Wei, Munan Ning, Xinyu Shi, Zhe Xu, Yefeng Zheng

In this study, we propose a novel Deformer module along with a multi-scale framework for the deformable image registration task.

Image Registration Medical Image Registration

Paper
Code

Multi-Anchor Active Domain Adaptation for Semantic Segmentation

2 code implementations • ICCV 2021 • Munan Ning, Donghuan Lu, Dong Wei, Cheng Bian, Chenglang Yuan, Shuang Yu, Kai Ma, Yefeng Zheng

Unsupervised domain adaption has proven to be an effective approach for alleviating the intensive workload of manual annotation by aligning the synthetic source-domain data and the real-world target-domain samples.

Active Learning Domain Adaptation +1

Paper
Code

A New Bidirectional Unsupervised Domain Adaptation Segmentation Framework

no code implementations • 18 Aug 2021 • Munan Ning, Cheng Bian, Dong Wei, Chenglang Yuan, Yaohua Wang, Yang Guo, Kai Ma, Yefeng Zheng

Domain shift happens in cross-domain scenarios commonly because of the wide gaps between different domains: when applying a deep learning model well-trained in one domain to another target domain, the model usually performs poorly.

Representation Learning Unsupervised Domain Adaptation

Paper
Add Code

Ensembled ResUnet for Anatomical Brain Barriers Segmentation

no code implementations • 29 Dec 2020 • Munan Ning, Cheng Bian, Chenglang Yuan, Kai Ma, Yefeng Zheng

However, due to the visual and anatomical differences between different modalities, the accurate segmentation of brain structures becomes challenging.

Decoder Segmentation

Paper
Add Code

A Macro-Micro Weakly-supervised Framework for AS-OCT Tissue Segmentation

no code implementations • 20 Jul 2020 • Munan Ning, Cheng Bian, Donghuan Lu, Hong-Yu Zhou, Shuang Yu, Chenglang Yuan, Yang Guo, Yaohua Wang, Kai Ma, Yefeng Zheng

Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness among Asian people.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.