1 code implementation • 24 Apr 2024 • Jiawei Yao, Qi Qian, Juhua Hu
Traditionally, aligning a user's brief keyword of interest with the corresponding vision components was challenging, but the emergence of multi-modal and large language models (LLMs) has begun to bridge this gap.
1 code implementation • 30 Nov 2023 • Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang
In this work, towards a more versatile copilot for academic paper writing, we mainly focus on strengthening the multi-modal diagram analysis ability of Multimodal LLMs.
1 code implementation • ICCV 2023 • Qi Qian
Meanwhile, one-stage methods are developed mainly for representation learning rather than clustering, where various constraints for cluster assignments are designed to avoid collapsing explicitly.
Ranked #1 on Unsupervised Image Classification on CIFAR-10
2 code implementations • 7 Nov 2023 • Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.
Ranked #11 on Visual Question Answering (VQA) on InfiMM-Eval
2 code implementations • 8 Oct 2023 • Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang
Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs.
1 code implementation • 15 Jun 2023 • Yuqi Zhang, Qi Qian, Hongsong Wang, Chong Liu, Weihua Chen, Fan Wang
In particular, the plain GCR is extended for cross-camera retrieval and an improved feature propagation formulation is presented to leverage affinity relationships across different cameras.
1 code implementation • 7 Jun 2023 • Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang
In addition, to facilitate a comprehensive evaluation of video-language models, we carefully build the largest human-annotated Chinese benchmarks covering three popular video-language tasks of cross-modal retrieval, video captioning, and video category classification.
1 code implementation • 27 Apr 2023 • Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github. com/X-PLUG/mPLUG-Owl.
Ranked #3 on Visual Question Answering (VQA) on HallusionBench
Visual Question Answering (VQA) Zero-Shot Video Question Answer
1 code implementation • 16 Apr 2023 • Junfeng Tian, Hehong Chen, Guohai Xu, Ming Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qi Qian, Wei Wang, Qinghao Ye, Jiejing Zhang, Ji Zhang, Fei Huang, Jingren Zhou
In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.
1 code implementation • ICCV 2023 • Junyang Wang, Yuanhong Xu, Juhua Hu, Ming Yan, Jitao Sang, Qi Qian
Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with limited training examples.
4 code implementations • 1 Feb 2023 • Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou
In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.
Ranked #1 on Video Captioning on MSR-VTT
no code implementations • ICCV 2023 • Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang
We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e. g., SSv2-Template and SSv2-Label) with 8. 6% and 11. 1% improvement respectively.
Ranked #1 on Visual Question Answering (VQA) on TGIF-QA
no code implementations • 2 Aug 2022 • Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, Hao Li
Further, we provide an in-depth analysis of the mechanism and rational behind our approach, which gives us a better understanding of why leverage logits in lieu of features can help domain generalization.
no code implementations • 25 May 2022 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Xiangyang Ji, Antoni B. Chan
With our empirical result obtained from 1, 330 models, we provide the following main observations: 1) ERM combined with data augmentation can achieve state-of-the-art performance if we choose a proper pre-trained model respecting the data property; 2) specialized algorithms further improve the robustness on top of ERM when handling a specific type of distribution shift, e. g., GroupDRO for spurious correlation and CORAL for large-scale out-of-distribution data; 3) Comparing different pre-training modes, architectures and data sizes, we provide novel observations about pre-training on distribution shift, which sheds light on designing or selecting pre-training strategy for different kinds of distribution shifts.
1 code implementation • CVPR 2022 • Haiyang Wang, Shaoshuai Shi, Ze Yang, Rongyao Fang, Qi Qian, Hongsheng Li, Bernt Schiele, LiWei Wang
In order to learn better representations of object shape to enhance cluster features for predicting 3D boxes, we propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays uniformly emitted from cluster centers.
Ranked #13 on 3D Object Detection on ScanNetV2
no code implementations • 23 Feb 2022 • Ruichen Li, Binghui Li, Qi Qian, LiWei Wang
Pruning well-trained neural networks is effective to achieve a promising accuracy-efficiency trade-off in computer vision regimes.
no code implementations • 24 Nov 2021 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, Rong Jin
The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning.
no code implementations • 1 Sep 2021 • Yi Xu, Lei Shang, Jinxing Ye, Qi Qian, Yu-Feng Li, Baigui Sun, Hao Li, Rong Jin
In this work we develop a simple yet powerful framework, whose key idea is to select a subset of training examples from the unlabeled data when performing existing SSL methods so that only the unlabeled examples with pseudo labels related to the labeled data will be used to train models.
1 code implementation • CVPR 2022 • Qi Qian, Yuanhong Xu, Juhua Hu, Hao Li, Rong Jin
Clustering is to assign each instance a pseudo label that will be used to learn representations in discrimination.
Ranked #5 on Unsupervised Image Classification on CIFAR-10
no code implementations • 13 May 2021 • Yi Xu, Qi Qian, Hao Li, Rong Jin
Stochastic gradient descent (SGD) has become the most attractive optimization method in training large-scale deep neural networks due to its simplicity, low computational cost in each updating step, and good performance.
no code implementations • 8 Apr 2021 • Yi Xu, Qi Qian, Hao Li, Rong Jin
Noisy labels are very common in deep supervised learning.
1 code implementation • CVPR 2021 • Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, Hao Li
To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$.
Ranked #12 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)
2 code implementations • 1 Feb 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin
Comparing with previous NAS methods, the proposed Zen-NAS is magnitude times faster on multiple server-side and mobile-side GPU platforms with state-of-the-art accuracy on ImageNet.
Ranked #2 on Neural Architecture Search on ImageNet
2 code implementations • ICCV 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin
To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures.
Neural Architecture Search Vocal Bursts Intensity Prediction
no code implementations • 3 Oct 2020 • Yi Xu, Asaf Noy, Ming Lin, Qi Qian, Hao Li, Rong Jin
To this end, we develop two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation.
1 code implementation • 30 Sep 2020 • Qi Qian, Hao Li, Juhua Hu
Recently, a number of works propose to transfer the pairwise similarity between examples to distill relative information.
no code implementations • 10 Sep 2020 • Lei Chen, Qi Qian, Hao Li
The anchor-free strategy benefits the classification task but can lead to sup-optimum for the regression task due to the lack of prior bounding boxes.
2 code implementations • 24 Jun 2020 • Ming Lin, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin
To address this issue, we propose a general principle for designing GPU-efficient networks based on extensive empirical studies.
no code implementations • 20 Jun 2020 • Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin
Label smoothing regularization (LSR) has a great success in training deep neural networks by stochastic algorithms such as stochastic gradient descent and its variants.
1 code implementation • ICCV 2021 • Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Juhua Hu
To mitigate this challenge, we propose an algorithm to learn the fine-grained patterns for the target task, when only its coarse-class labels are available.
no code implementations • CVPR 2020 • Qi Qian, Juhua Hu, Hao Li
Experiments on benchmark data sets demonstrate the effectiveness of the robust deep representations.
5 code implementations • ICCV 2019 • Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, Rong Jin
The set of triplet constraints has to be sampled within the mini-batch.
Ranked #21 on Metric Learning on CUB-200-2011 (using extra training data)
1 code implementation • CVPR 2020 • Qi Qian, Lei Chen, Hao Li, Rong Jin
This architecture is efficient but can suffer from the imbalance issue with respect to two aspects: the inter-class imbalance between the number of candidates from foreground and background classes and the intra-class imbalance in the hardness of background candidates, where only a few candidates are hard to be identified.
no code implementations • 3 Jun 2019 • Ming Lin, Xiaomin Song, Qi Qian, Hao Li, Liang Sun, Shenghuo Zhu, Rong Jin
We validate the superiority of the proposed method in our real-time high precision positioning system against several popular state-of-the-art robust regression methods.
no code implementations • 30 Jan 2019 • Ming Lin, Shuang Qiu, Jieping Ye, Xiaomin Song, Qi Qian, Liang Sun, Shenghuo Zhu, Rong Jin
This bound is sub-optimal comparing to the information theoretical lower bound $\mathcal{O}(kd)$.
no code implementations • CVPR 2018 • Qi Qian, Jiasheng Tang, Hao Li, Shenghuo Zhu, Rong Jin
Furthermore, we can show that the metric is learned from latent examples only, but it can preserve the large margin property even for the original data.
no code implementations • 19 May 2018 • Qi Qian, Shenghuo Zhu, Jiasheng Tang, Rong Jin, Baigui Sun, Hao Li
Hence, we propose to learn the model and the adversarial distribution simultaneously with the stochastic algorithm for efficiency.
no code implementations • 6 Dec 2015 • Qi Qian, Inci M. Baytas, Rong Jin, Anil Jain, Shenghuo Zhu
The similarity between pairs of images can be measured by the distances between their high dimensional representations, and the problem of learning the appropriate similarity is often addressed by distance metric learning.
no code implementations • 15 Sep 2015 • Qi Qian, Rong Jin, Lijun Zhang, Shenghuo Zhu
In this work, we present a dual random projection frame for DML with high dimensional data that explicitly addresses the limitation of dimensionality reduction for DML.
no code implementations • CVPR 2015 • Qi Qian, Rong Jin, Shenghuo Zhu, Yuanqing Lin
To this end, we proposed a multi-stage metric learning framework that divides the large-scale high dimensional learning problem to a series of simple subproblems, achieving $\mathcal{O}(d)$ computational complexity.
no code implementations • 3 Apr 2013 • Qi Qian, Rong Jin, Jin-Feng Yi, Lijun Zhang, Shenghuo Zhu
Although stochastic gradient descent (SGD) has been successfully applied to improve the efficiency of DML, it can still be computationally expensive because in order to ensure that the solution is a PSD matrix, it has to, at every iteration, project the updated distance metric onto the PSD cone, an expensive operation.