Search Results for author: Wenqiao Zhang

Found 27 papers, 7 papers with code

DuetRAG: Collaborative Retrieval-Augmented Generation

no code implementations • 12 May 2024 • Dian Jiao, Li Cai, Jingsheng Huang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks.

Philosophy Retrieval

Paper
Add Code

LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation

no code implementations • 21 Apr 2024 • Haoyu Zheng, Wenqiao Zhang, Yaoke Wang, Hao Zhou, Jiang Liu, Juncheng Li, Zheqi Lv, Siliang Tang, Yueting Zhuang

Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e. g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance.

Image Generation Image Morphing +2

Paper
Add Code

Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales

no code implementations • 17 Apr 2024 • Minghe Gao, Shuang Chen, Liang Pang, Yuan YAO, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

Their ability to execute intricate compositional reasoning tasks is also constrained, culminating in a stagnation of learning progression for these models.

Hallucination

Paper
Add Code

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

1 code implementation • 20 Mar 2024 • Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang

Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks.

Ranked #77 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection

no code implementations • 28 Dec 2023 • Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Wenqiao Zhang

Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively.

Anomaly Detection Decision Making

Paper
Add Code

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer

no code implementations • 21 Nov 2023 • Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Siliang Tang, Yueting Zhuang

Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources.

Domain Adaptation Transfer Learning

Paper
Add Code

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback

no code implementations • 21 Nov 2023 • Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.

Logical Reasoning

Paper
Add Code

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

1 code implementation • 8 Aug 2023 • Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Hanwang Zhang, Yueting Zhuang

This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.

Caption Generation Image Captioning +1

327

Paper
Code

Denoising Multi-modal Sequential Recommenders with Contrastive Learning

no code implementations • 3 May 2023 • Dong Yao, Shengyu Zhang, Zhou Zhao, Jieming Zhu, Wenqiao Zhang, Rui Zhang, Xiaofei He, Fei Wu

In contrast, modalities that do not cause users' behaviors are potential noises and might mislead the learning of a recommendation model.

Contrastive Learning Denoising +2

Paper
Add Code

Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels

no code implementations • ICCV 2023 • Wenqiao Zhang, Changshuo Liu, Lingze Zeng, Beng Chin Ooi, Siliang Tang, Yueting Zhuang

Conventional multi-label classification (MLC) methods assume that all samples are fully labeled and identically distributed.

Missing Labels Philosophy

Paper
Add Code

Toward Cohort Intelligence: A Universal Cohort Representation Learning Framework for Electronic Health Record Analysis

no code implementations • 10 Apr 2023 • Changshuo Liu, Wenqiao Zhang, Beng Chin Ooi, James Wei Luen Yip, Lingze Zeng, Kaiping Zheng

In this paper, we propose a universal COhort Representation lEarning (CORE) framework to augment EHR utilization by leveraging the fine-grained cohort information among patients.

Representation Learning

Paper
Add Code

CAusal and collaborative proxy-tasKs lEarning for Semi-Supervised Domain Adaptation

no code implementations • 30 Mar 2023 • Wenqiao Zhang, Changshuo Liu, Can Cui, Beng Chin Ooi

In this paper, we analyze the SSDA problem from two perspectives that have previously been overlooked, and correspondingly decompose it into two \emph{key subproblems}: \emph{robust domain adaptation (DA) learning} and \emph{maximal cross-domain data utilization}.

Domain Adaptation Semi-supervised Domain Adaptation

Paper
Add Code

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

no code implementations • ICCV 2023 • Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.

Domain Generalization Few-Shot Learning +1

Paper
Add Code

IDEAL: Toward High-efficiency Device-Cloud Collaborative and Dynamic Recommendation System

no code implementations • 14 Feb 2023 • Zheqi Lv, Zhengyu Chen, Shengyu Zhang, Kun Kuang, Wenqiao Zhang, Mengze Li, Beng Chin Ooi, Fei Wu

The aforementioned two trends enable the device-cloud collaborative and dynamic recommendation, which deeply exploits the recommendation pattern among cloud-device data and efficiently characterizes different instances with different underlying distributions based on the cost of frequent device-cloud communication.

Recommendation Systems Vocal Bursts Intensity Prediction

Paper
Add Code

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations • 22 Jan 2023 • Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Paper
Add Code

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning

1 code implementation • CVPR 2023 • Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua

Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.

Active Learning Moment Retrieval +1

Paper
Code

WINNER: Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding

no code implementations • CVPR 2023 • Mengze Li, Han Wang, Wenqiao Zhang, Jiaxu Miao, Zhou Zhao, Shengyu Zhang, Wei Ji, Fei Wu

WINNER first builds the language decomposition tree in a bottom-up manner, upon which the structural attention mechanism and top-down feature backtracking jointly build a multi-modal decomposition tree, permitting a hierarchical understanding of unstructured videos.

Contrastive Learning Spatio-Temporal Video Grounding +1

Paper
Add Code

DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization

1 code implementation • 12 Sep 2022 • Zheqi Lv, Wenqiao Zhang, Shengyu Zhang, Kun Kuang, Feng Wang, Yongwei Wang, Zhengyu Chen, Tao Shen, Hongxia Yang, Beng Chin Ooi, Fei Wu

DUET is deployed on a powerful cloud server that only requires the low cost of forwarding propagation and low time delay of data transmission between the device and the cloud.

Device-Cloud Collaboration Domain Adaptation +3

Paper
Code

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

1 code implementation • 3 Aug 2022 • Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.

Emotion Classification Temporal Action Localization +1

Paper
Code

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

no code implementations • 9 Jul 2022 • Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image.

Content-Based Image Retrieval counterfactual +2

Paper
Add Code

Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning

no code implementations • 7 Jun 2022 • Jiannan Guo, Yangyang Kang, Yu Duan, Xiaozhong Liu, Siliang Tang, Wenqiao Zhang, Kun Kuang, Changlong Sun, Fei Wu

Motivated by the industry practice of labeling data, we propose an innovative Inconsistency-based virtual aDvErsarial Active Learning (IDEAL) algorithm to further investigate SSL-AL's potential superiority and achieve mutual enhancement of AL and SSL, i. e., SSL propagates label information to unlabeled samples and provides smoothed embeddings for AL, while AL excludes samples with inconsistent predictions and considerable uncertainty for SSL.

Active Learning

Paper
Add Code

DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes

1 code implementation • 31 May 2022 • Jia-Wei Liu, Yan-Pei Cao, Weijia Mao, Wenqiao Zhang, David Junhao Zhang, Jussi Keppo, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this paper, we present DeVRF, a novel representation to accelerate learning dynamic radiance fields.

Novel View Synthesis

179

Paper
Code

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding

no code implementations • ACL 2022 • Mengze Li, Tianbao Wang, Haoyu Zhang, Shengyu Zhang, Zhou Zhao, Jiaxu Miao, Wenqiao Zhang, Wenming Tan, Jin Wang, Peng Wang, ShiLiang Pu, Fei Wu

To achieve effective grounding under a limited annotation budget, we investigate one-shot video grounding, and learn to ground natural language in all video frames with solely one frame labeled, in an end-to-end manner.

Descriptive Representation Learning +1

Paper
Add Code

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

1 code implementation • CVPR 2022 • Wenqiao Zhang, Lei Zhu, James Hallinan, Andrew Makmur, Shengyu Zhang, Qingpeng Cai, Beng Chin Ooi

In this paper, we propose a novel semi-supervised learning (SSL) framework named BoostMIS that combines adaptive pseudo labeling and informative active annotation to unleash the potential of medical image SSL models: (1) BoostMIS can adaptively leverage the cluster assumption and consistency regularization of the unlabeled data according to the current learning status.

Active Learning

Paper
Code

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning

no code implementations • 13 Dec 2021 • Wenqiao Zhang, Haochen Shi, Jiannan Guo, Shengyu Zhang, Qingpeng Cai, Juncheng Li, Sihui Luo, Yueting Zhuang

We propose the Multimodal relAtional Graph adversarIal inferenCe (MAGIC) framework for diverse and unpaired TextCap.

Caption Generation Descriptive +3

Paper
Add Code

Consensus Graph Representation Learning for Better Grounded Image Captioning

no code implementations • 2 Dec 2021 • Wenqiao Zhang, Haochen Shi, Siliang Tang, Jun Xiao, Qiang Yu, Yueting Zhuang

The contemporary visual captioning models frequently hallucinate objects that are not actually in a scene, due to the visual misclassification or over-reliance on priors that resulting in the semantic inconsistency between the visual information and the target lexical words.

Graph Representation Learning Hallucination +1

Paper
Add Code

Relational Graph Learning for Grounded Video Description Generation

no code implementations • 2 Dec 2021 • Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haocheng Shi, Jun Xiao, Yueting Zhuang, William Yang Wang

Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description.

Graph Learning Hallucination +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.