Search Results for author: Xiaosong Zhang

Found 14 papers, 10 papers with code

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

2 code implementations • 6 Feb 2024 • Quan Sun, Jinsheng Wang, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang

Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models.

Ranked #1 on Zero-Shot Transfer Image Classification on SUN

Image Classification Zero-Shot Transfer Image Classification

1,985

Paper
Code

Generative Multimodal Models are In-Context Learners

1 code implementation • 20 Dec 2023 • Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

Ranked #21 on Visual Question Answering on MM-Vet

In-Context Learning Question Answering +2

1,505

Paper
Code

CapsFusion: Rethinking Image-Text Data at Scale

1 code implementation • 31 Oct 2023 • Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu

To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions.

World Knowledge

172

Paper
Code

Emu: Generative Pretraining in Multimodality

2 code implementations • 11 Jul 2023 • Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context.

Ranked #1 on Visual Question Answering on VQA v2

Image Captioning Temporal/Casual QA +4

1,505

Paper
Code

SegGPT: Segmenting Everything In Context

1 code implementation • 6 Apr 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)

Few-Shot Semantic Segmentation In-Context Learning +5

2,430

Paper
Code

SegGPT: Towards Segmenting Everything in Context

no code implementations • ICCV 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Few-Shot Semantic Segmentation In-Context Learning +4

Paper
Add Code

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

1 code implementation • 30 May 2022 • Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian

A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.

Transfer Learning

Paper
Code

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

3 code implementations • ICCV 2023 • Feng Liu, Xiaosong Zhang, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models.

Ranked #3 on Few-Shot Object Detection on MS-COCO (30-shot)

Decoder Few-Shot Object Detection +3

Paper
Code

Long-tailed Distribution Adaptation

1 code implementation • 6 Oct 2021 • Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye

We propose to jointly optimize empirical risks of the unbalanced and balanced domains and approximate their domain divergence by intra-class and inter-class distances, with the aim to adapt models trained on the long-tailed distribution to general distributions in an interpretable way.

Domain Adaptation Instance Segmentation +3

Paper
Code

Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection

2 code implementations • CVPR 2021 • Zonghao Guo, Chang Liu, Xiaosong Zhang, Jianbin Jiao, Xiangyang Ji, Qixiang Ye

Detecting oriented and densely packed objects remains challenging for spatial feature aliasing caused by the intersection of reception fields between objects.

Ranked #34 on Object Detection In Aerial Images on DOTA (using extra training data)

object-detection Object Detection In Aerial Images

1,740

Paper
Code

FreeAnchor: Learning to Match Anchors for Visual Object Detection

4 code implementations • NeurIPS 2019 • Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, Qixiang Ye

In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner.

Ranked #136 on Object Detection on COCO test-dev

Object object-detection +1

27,933

Paper
Code

Adversarial Samples on Android Malware Detection Systems for IoT Systems

no code implementations • 12 Feb 2019 • Xiaolei Liu, Xiaojiang Du, Xiaosong Zhang, Qingxin Zhu, Mohsen Guizani

An automated testing framework is needed to help these learning-based malware detection systems for IoT devices perform security analysis.

Android Malware Detection Malware Detection

Paper
Add Code

Weighted-Sampling Audio Adversarial Example Attack

no code implementations • 26 Jan 2019 • Xiaolei Liu, Xiaosong Zhang, Kun Wan, Qingxin Zhu, Yufei Ding

In this paper, we propose~\textit{weighted-sampling audio adversarial examples}, focusing on the numbers and the weights of distortion to reinforce the attack.

Adversarial Attack Automatic Speech Recognition +3

Paper
Add Code

A Black-box Attack on Neural Networks Based on Swarm Evolutionary Algorithm

no code implementations • 26 Jan 2019 • Xiaolei Liu, Yuheng Luo, Xiaosong Zhang, Qingxin Zhu

Our experimental results show that both the MNIST images and the CIFAR-10 images can be perturbed to successful generate a black-box attack with 100\% probability on average.

Evolutionary Algorithms

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.