TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Fine-Grained Image Classification	Herbarium 2021 Half–Earth	Conviformer-B	Test F1 score	.719	# 1
Fine-Grained Image Classification	Herbarium 2022	Conviformer-B	Test F1 score (private)	.868	# 1
Image Classification	iNaturalist 2019	Conviformer-B	Top-1 Accuracy	82.85	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conviformers-convolutionally-guided-vision/fine-grained-image-classification-on-4)](https://paperswithcode.com/sota/fine-grained-image-classification-on-4?p=conviformers-convolutionally-guided-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conviformers-convolutionally-guided-vision/fine-grained-image-classification-on-5)](https://paperswithcode.com/sota/fine-grained-image-classification-on-5?p=conviformers-convolutionally-guided-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conviformers-convolutionally-guided-vision/image-classification-on-inaturalist-2019)](https://paperswithcode.com/sota/image-classification-on-inaturalist-2019?p=conviformers-convolutionally-guided-vision)`

Conviformers: Convolutionally guided Vision Transformer

17 Aug 2022 · Mohit Vaishnav, Thomas Fel, Ivań Felipe Rodríguez, Thomas Serre ·

Vision transformers are nowadays the de-facto choice for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computational cost associated with vision transformers (ViT). In this work, we present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. Our extensive experimental analysis indicated the need for a better augmentation technique and the ability of modern-day neural networks to handle higher dimensional images. We also introduce a convolutional transformer architecture called Conviformer which, unlike the popular Vision Transformer (ConViT), can handle higher resolution images without exploding memory and computational cost. We also introduce a novel, improved pre-processing technique called PreSizer to resize images better while preserving their original aspect ratios, which proved essential for classifying natural plants. With our simple yet effective approach, we achieved SoTA on Herbarium 202x and iNaturalist 2019 dataset.

PDF Abstract

Code

Add Remove Mark official

vaishnavmohit/Conviformer official

Tasks

Add Remove

Fine-Grained Image Classification

Image Classification

Datasets

iNaturalist

Herbarium 2021 Half–Earth

Herbarium 2022

Results from the Paper

Add Remove

Ranked #1 on Fine-Grained Image Classification on Herbarium 2022

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Fine-Grained Image Classification	Herbarium 2021 Half–Earth	Conviformer-B	Test F1 score	.719	# 1	Compare
Fine-Grained Image Classification	Herbarium 2022	Conviformer-B	Test F1 score (private)	.868	# 1	Compare
Image Classification	iNaturalist 2019	Conviformer-B	Top-1 Accuracy	82.85	# 5	Compare

Methods

Add Remove

Absolute Position Encodings • BPE • ConViT • Dense Connections • Dropout • GPSA • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

Conviformers: Convolutionally guided Vision Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove