Search Results for author: Zhen Zhao

Found 35 papers, 16 papers with code

Training-Free Unsupervised Prompt for Vision-Language Models

1 code implementation • 25 Apr 2024 • Sifan Long, Linbin Wang, Zhen Zhao, Zichang Tan, Yiming Wu, Shengsheng Wang, Jingdong Wang

In light of this, we propose Training-Free Unsupervised Prompts (TFUP), which maximally preserves the inherent representation capabilities and enhances them with a residual connection to similarity-based prediction probabilities in a training-free and labeling-free manner.

Paper
Code

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

no code implementations • 19 Apr 2024 • Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.

Hallucination Hallucination Evaluation +2

Paper
Add Code

SWBT: Similarity Weighted Behavior Transformer with the Imperfect Demonstration for Robotic Manipulation

no code implementations • 17 Jan 2024 • Kun Wu, Ning Liu, Zhen Zhao, Di Qiu, Jinming Li, Zhengping Che, Zhiyuan Xu, Qinru Qiu, Jian Tang

Imitation learning (IL), aiming to learn optimal control policies from expert demonstrations, has been an effective method for robot manipulation tasks.

Imitation Learning Robot Manipulation

Paper
Add Code

SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images for Articulated Objects

no code implementations • 17 Jan 2024 • Haowen Wang, Zhen Zhao, Zhao Jin, Zhengping Che, Liang Qiao, Yakun Huang, Zhipeng Fan, XIUQUAN QIAO, Jian Tang

Reconstructing real-world objects and estimating their movable joint structures are pivotal technologies within the field of robotics.

Paper
Add Code

Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

3 code implementations • 19 Dec 2023 • Yue Duan, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, Yinghuan Shi

While semi-supervised learning (SSL) has yielded promising results, the more realistic SSL scenario remains to be explored, in which the unlabeled data exhibits extremely high recognition difficulty, e. g., fine-grained visual classification in the context of SSL (SS-FGVC).

Fine-Grained Image Classification Pseudo Label

Paper
Code

Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation

1 code implementation • 29 Nov 2023 • Zhen Zhao, Zicheng Wang, Longyue Wang, Yixuan Yuan, Luping Zhou

To mitigate the confirmation bias from the diverse supervision, the core of AD-MT lies in two proposed modules: the Random Periodic Alternate (RPA) Updating Module and the Conflict-Combating Module (CCM).

Data Augmentation Image Segmentation +2

Paper
Code

Clean Label Disentangling for Medical Image Segmentation with Noisy Labels

1 code implementation • 28 Nov 2023 • Zicheng Wang, Zhen Zhao, Erjian Guo, Luping Zhou

Current methods focusing on medical image segmentation suffer from incorrect annotations, which is known as the noisy label issue.

Disentanglement Image Segmentation +2

Paper
Code

Progressive Target-Styled Feature Augmentation for Unsupervised Domain Adaptation on Point Clouds

1 code implementation • 27 Nov 2023 • Zicheng Wang, Zhen Zhao, Yiming Wu, Luping Zhou, Dong Xu

Unlike previous works that focus on feature extractor adaptation, our PTSFA approach focuses on classifier adaptation.

Self-Supervised Learning Unsupervised Domain Adaptation

Paper
Code

GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

no code implementations • 25 Nov 2023 • Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu

While the recent advances in Multimodal Large Language Models (MLLMs) constitute a significant leap forward in the field, these models are predominantly confined to the realm of input-side multimodal comprehension, lacking the capacity for multimodal content generation.

Instruction Following Language Modelling +7

Paper
Add Code

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

1 code implementation • 22 Nov 2023 • Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Can Huang, Hao liu, Xin Tan, Zhizhong Zhang, Yuan Xie

A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios.

In-Context Learning Scene Text Recognition

Paper
Code

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

no code implementations • 2 Oct 2023 • Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao

Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos.

Autonomous Driving Language Modelling +2

Paper
Add Code

Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning

1 code implementation • ICCV 2023 • Guan Gui, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, Yinghuan Shi

Sample adaptive augmentation (SAA) is proposed for this stated purpose and consists of two modules: 1) sample selection module; 2) sample augmentation module.

Paper
Code

Rethinking Data Perturbation and Model Stabilization for Semi-supervised Medical Image Segmentation

1 code implementation • 23 Aug 2023 • Zhen Zhao, Ye Liu, Meng Zhao, Di Yin, Yixuan Yuan, Luping Zhou

Studies on semi-supervised medical image segmentation (SSMIS) have seen fast progress recently.

Image Segmentation Segmentation +2

Paper
Code

Towards Semi-supervised Learning with Non-random Missing Labels

2 code implementations • ICCV 2023 • Yue Duan, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, Yinghuan Shi

Semi-supervised learning (SSL) tackles the label missing problem by enabling the effective usage of unlabeled data.

Missing Labels Semi-Supervised Image Classification

Paper
Code

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning

1 code implementation • ICCV 2023 • Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao, Yinghuan Shi, Hengshuang Zhao

To mitigate potentially incorrect pseudo labels, recent frameworks mostly set a fixed confidence threshold to discard uncertain samples.

Ranked #1 on Semi-Supervised Image Classification on SVHN, 40 Labels

Semi-Supervised Image Classification

Paper
Code

DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field

no code implementations • 4 Aug 2023 • Haowen Wang, Zhipeng Fan, Zhen Zhao, Zhengping Che, Zhiyuan Xu, Dong Liu, Feifei Feng, Yakun Huang, XIUQUAN QIAO, Jian Tang

We introduce a pose regression module that shares the deformation features and template codes from the fields to estimate the accurate 6D pose of each object in the scene.

Object Pose Estimation

Paper
Add Code

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

no code implementations • ICCV 2023 • Sifan Long, Zhen Zhao, Junkun Yuan, Zichang Tan, JiangJiang Liu, Luping Zhou, Shengsheng Wang, Jingdong Wang

A contrastive loss is employed to align such augmented text and image representations on downstream tasks.

Paper
Add Code

Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation

1 code implementation • CVPR 2023 • Zicheng Wang, Zhen Zhao, Xiaoxia Xing, Dong Xu, Xiangyu Kong, Luping Zhou

In this work, we propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework which aims at enforcing the two sub-nets to learn informative features from irrelevant views.

Semi-Supervised Semantic Segmentation

Paper
Code

Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling

no code implementations • CVPR 2023 • Zhen Zhao, Zhizhong Zhang, Xin Tan, Jun Liu, Yanyun Qu, Yuan Xie, Lizhuang Ma

In this paper, we propose a space decoupling (SD) algorithm to decouple the feature space into a pair of complementary subspaces, i. e., the stability space I, and the plasticity space R. I is established by conducting space intersection between the historic and current feature space, and thus I contains more task-shared bases.

Continual Learning

Paper
Add Code

Augmentation Matters: A Simple-yet-Effective Approach to Semi-supervised Semantic Segmentation

1 code implementation • CVPR 2023 • Zhen Zhao, Lihe Yang, Sifan Long, Jimin Pi, Luping Zhou, Jingdong Wang

Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approach that focuses mainly on data perturbations to boost the SSS performance.

Semi-Supervised Semantic Segmentation

100

Paper
Code

Instance-specific and Model-adaptive Supervision for Semi-supervised Semantic Segmentation

1 code implementation • CVPR 2023 • Zhen Zhao, Sifan Long, Jimin Pi, Jingdong Wang, Luping Zhou

Relying on the model's performance, iMAS employs a class-weighted symmetric intersection-over-union to evaluate quantitative hardness of each unlabeled instance and supervises the training on unlabeled data in a model-adaptive manner.

Segmentation Semi-Supervised Semantic Segmentation

Paper
Code

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

1 code implementation • CVPR 2023 • Sifan Long, Zhen Zhao, Jimin Pi, Shengsheng Wang, Jingdong Wang

In this paper, we emphasize the cruciality of diverse global semantics and propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Computational Efficiency Efficient ViTs

Paper
Code

MutexMatch: Semi-Supervised Learning with Mutex-Based Consistency Regularization

3 code implementations • 27 Mar 2022 • Yue Duan, Zhen Zhao, Lei Qi, Lei Wang, Luping Zhou, Yinghuan Shi, Yang Gao

The core issue in semi-supervised learning (SSL) lies in how to effectively leverage unlabeled data, whereas most existing methods tend to put a great emphasis on the utilization of high-confidence samples yet seldom fully explore the usage of low-confidence samples.

Ranked #1 on Semi-Supervised Image Classification on Mini-ImageNet, 1000 Labels

Semi-Supervised Image Classification

Paper
Code

The Winning Solution to the iFLYTEK Challenge 2021 Cultivated Land Extraction from High-Resolution Remote Sensing Image

1 code implementation • 22 Feb 2022 • Zhen Zhao, Yuqiu Liu, Gang Zhang, Liang Tang, Xiaolin Hu

This report introduces our solution to the iFLYTEK challenge 2021 cultivated land extraction from high-resolution remote sensing image.

Instance Segmentation Segmentation +1

Paper
Code

DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning

no code implementations • CVPR 2022 • Zhen Zhao, Luping Zhou, Yue Duan, Lei Wang, Lei Qi, Yinghuan Shi

Consistency-based Semi-supervised learning (SSL) has achieved promising performance recently.

Pseudo Label

Paper
Add Code

Bi-Dimensional Feature Alignment for Cross-Domain Object Detection

no code implementations • 14 Nov 2020 • Zhen Zhao, Yuhong Guo, Jieping Ye

Recently the problem of cross-domain object detection has started drawing attention in the computer vision community.

Ranked #2 on Image-to-Image Translation on Cityscapes-to-Foggy Cityscapes

Object Object Detection +1

Paper
Add Code

Active Crowd Counting with Limited Supervision

no code implementations • ECCV 2020 • Zhen Zhao, Miaojing Shi, Xiaoxiao Zhao, Li Li

To learn a reliable people counter from crowd images, head center annotations are normally required.

Active Learning Crowd Counting +1

Paper
Add Code

Ensemble Model with Batch Spectral Regularization and Data Blending for Cross-Domain Few-Shot Learning with Unlabeled Data

no code implementations • 8 Jun 2020 • Zhen Zhao, Bingyu Liu, Yuhong Guo, Jieping Ye

In this paper, we present our proposed ensemble model with batch spectral regularization and data blending mechanisms for the Track 2 problem of the cross-domain few-shot learning (CD-FSL) challenge.

cross-domain few-shot learning

Paper
Add Code

Feature Transformation Ensemble Model with Batch Spectral Regularization for Cross-Domain Few-Shot Classification

no code implementations • 18 May 2020 • Bingyu Liu, Zhen Zhao, Zhenpeng Li, Jianan Jiang, Yuhong Guo, Jieping Ye

In this paper, we propose a feature transformation ensemble model with batch spectral regularization for the Cross-domain few-shot learning (CD-FSL) challenge.

cross-domain few-shot learning Data Augmentation +2

Paper
Add Code

Adaptive Object Detection with Dual Multi-Label Prediction

no code implementations • ECCV 2020 • Zhen Zhao, Yuhong Guo, Haifeng Shen, Jieping Ye

In this paper, we propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection by exploiting multi-label object recognition as a dual auxiliary task.

Ranked #3 on Image-to-Image Translation on Cityscapes-to-Foggy Cityscapes

Image-to-Image Translation Object +4

Paper
Add Code

Mutual Learning Network for Multi-Source Domain Adaptation

no code implementations • 29 Mar 2020 • Zhenpeng Li, Zhen Zhao, Yuhong Guo, Haifeng Shen, Jieping Ye

However, in practice the labeled data can come from multiple source domains with different distributions.

Unsupervised Domain Adaptation

Paper
Add Code

Fast Inference in Capsule Networks Using Accumulated Routing Coefficients

no code implementations • 15 Apr 2019 • Zhen Zhao, Ashley Kleinhans, Gursharan Sandhu, Ishan Patel, K. P. Unnikrishnan

Afterward, the routing coefficients associated with the training examples are accumulated offline and used to create a set of "master" routing coefficients.

Object Rotated MNIST

Paper
Add Code

Capsule Networks with Max-Min Normalization

no code implementations • 22 Mar 2019 • Zhen Zhao, Ashley Kleinhans, Gursharan Sandhu, Ishan Patel, K. P. Unnikrishnan

Capsule Networks (CapsNet) use the Softmax function to convert the logits of the routing coefficients into a set of normalized values that signify the assignment probabilities between capsules in adjacent layers.

Paper
Add Code

CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble(GAN-CIRCLE)

no code implementations • 10 Aug 2018 • Chenyu You, Guang Li, Yi Zhang, Xiaoliu Zhang, Hongming Shan, Shenghong Ju, Zhen Zhao, Zhuiyang Zhang, Wenxiang Cong, Michael W. Vannier, Punam K. Saha, Ge Wang

Specifically, with the generative adversarial network (GAN) as the building block, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised and deblurred HR outputs.

Computed Tomography (CT) Generative Adversarial Network +2

Paper
Add Code

Structure-sensitive Multi-scale Deep Neural Network for Low-Dose CT Denoising

no code implementations • 2 May 2018 • Chenyu You, Qingsong Yang, Hongming Shan, Lars Gjesteby, Guang Li, Shenghong Ju, Zhuiyang Zhang, Zhen Zhao, Yi Zhang, Wenxiang Cong, Ge Wang

However, the radiation dose reduction compromises the signal-to-noise ratio (SNR), leading to strong noise and artifacts that down-grade CT image quality.

Computed Tomography (CT) Denoising

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.