Search Results for author: Zhengcong Fei

Found 21 papers, 11 papers with code

Dimba: Transformer-Mamba Diffusion Models

no code implementations • 3 Jun 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang

This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements.

Text-to-Image Generation

Paper
Add Code

Music Consistency Models

no code implementations • 20 Apr 2024 • Zhengcong Fei, Mingyuan Fan, Junshi Huang

Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps.

Computational Efficiency Music Generation +1

Paper
Add Code

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

1 code implementation • 6 Apr 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang

Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields.

Image Generation Unconditional Image Generation

Paper
Code

Scalable Diffusion Models with State Space Backbone

2 code implementations • 8 Feb 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space.

Conditional Image Generation

135

Paper
Code

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

no code implementations • 22 Dec 2023 • Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang

Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e. g., changing postures) to the main objects in the input image without changing their identity or attributes.

Denoising

Paper
Add Code

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

no code implementations • 27 Nov 2023 • Zhengcong Fei, Mingyuan Fan, Junshi Huang

The target representations of those regions are extracted by the exponential moving average of context encoder, \emph{i. e.}, target encoder, on the whole spectrogram.

Self-Supervised Learning

Paper
Add Code

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

no code implementations • 10 Sep 2023 • Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, Yanqing Guo

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems.

Denoising Image Captioning

Paper
Add Code

DiT: Efficient Vision Transformers with Dynamic Token Routing

1 code implementation • 7 Aug 2023 • Yuchen Ma, Zhengcong Fei, Junshi Huang

The proposed framework generates a data-dependent path per token, adapting to the object scales and visual discrimination of tokens.

Instance Segmentation Object +3

Paper
Code

Gradient-Free Textual Inversion

no code implementations • 12 Apr 2023 • Zhengcong Fei, Mingyuan Fan, Junshi Huang

Recent works on personalized text-to-image generation usually learn to bind a special token with specific subjects or styles of a few given images by tuning its embedding through gradient descent.

Computational Efficiency Dimensionality Reduction +1

Paper
Add Code

Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond

1 code implementation • CVPR 2023 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

In this paper, we introduce a novel Generative Adversarial Networks alike framework, referred to as GAN-MAE, where a generator is used to generate the masked patches according to the remaining visible patches, and a discriminator is employed to predict whether the patch is synthesized by the generator.

Representation Learning

Paper
Code

Uncertainty-Aware Image Captioning

no code implementations • 30 Nov 2022 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it.

Caption Generation Image Captioning +1

Paper
Add Code

Meta-Ensemble Parameter Learning

no code implementations • 5 Oct 2022 • Zhengcong Fei, Shuman Tian, Junshi Huang, Xiaoming Wei, Xiaolin Wei

Knowledge distillation is an approach that allows a single model to efficiently capture the approximate performance of an ensemble while showing poor scalability as demand for re-training when introducing new teacher models.

Knowledge Distillation Meta-Learning

Paper
Add Code

Progressive Text-to-Image Generation

no code implementations • 5 Oct 2022 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang

Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space.

Denoising Text-to-Image Generation

Paper
Add Code

Selecting Stickers in Open-Domain Dialogue through Multitask Learning

1 code implementation • Findings (ACL) 2022 • Zhexin Zhang, Yeshuang Zhu, Zhengcong Fei, Jinchao Zhang, Jie zhou

With the increasing popularity of online chatting, stickers are becoming important in our online communication.

Paper
Code

Efficient Modeling of Future Context for Image Captioning

1 code implementation • 22 Jul 2022 • Zhengcong Fei, Junshi Huang, Xiaoming Wei, Xiaolin Wei

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words.

Image Captioning Sentence +1

Paper
Code

DeeCap: Dynamic Early Exiting for Efficient Image Captioning

1 code implementation • CVPR 2022 • Zhengcong Fei, Xu Yan, Shuhui Wang, Qi Tian

On one hand, the representation in shallow layers lacks high-level semantic and sufficient cross-modal fusion information for accurate prediction.

Image Captioning Imitation Learning

Paper
Code

DVCFlow: Modeling Information Flow Towards Human-like Video Captioning

no code implementations • 19 Nov 2021 • Xu Yan, Zhengcong Fei, Shuhui Wang, Qingming Huang, Qi Tian

Dense video captioning (DVC) aims to generate multi-sentence descriptions to elucidate the multiple events in the video, which is challenging and demands visual consistency, discoursal coherence, and linguistic diversity.

Dense Video Captioning Sentence

Paper
Add Code

Semi-Autoregressive Image Captioning

1 code implementation • 11 Oct 2021 • Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian

Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable performance to the autoregressive counterparts with a considerable acceleration.

Decoder Image Captioning +1

Paper
Code

Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark

1 code implementation • 4 Sep 2021 • Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng, Jie zhou

Compared to previous dialogue tasks, MOD is much more challenging since it requires the model to understand the multimodal elements as well as the emotions behind them.

Paper
Code

Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot Consistency

1 code implementation • Findings (ACL) 2021 • Zekang Li, Jinchao Zhang, Zhengcong Fei, Yang Feng, Jie zhou

Employing human judges to interact with chatbots on purpose to check their capacities is costly and low-efficient, and difficult to get rid of subjective bias.

Chatbot Natural Language Inference

Paper
Code

Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances

1 code implementation • ACL 2021 • Zekang Li, Jinchao Zhang, Zhengcong Fei, Yang Feng, Jie zhou

Nowadays, open-domain dialogue models can generate acceptable responses according to the historical context based on the large-scale pre-trained language models.

Dialogue Evaluation Dialogue Generation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.