TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Image Classification	RVL-CDIP	Multimodal (ResNet50)	Accuracy	92.7%	# 20
Document Image Classification	RVL-CDIP	Multimodal (ResNet50)	Parameters	57M	# 14
Document Image Classification	RVL-CDIP	Multimodal (MobileNetV2)	Accuracy	92.2%	# 24
Document Image Classification	RVL-CDIP	Multimodal (MobileNetV2)	Parameters	12M	# 12
Document Image Classification	Tobacco-3482	Multimodal Side-Tuning (MobileNetV2)	Accuracy	90.50	# 4
Document Image Classification	Tobacco-3482	Multimodal Side-Tuning (ResNet50)	Accuracy	90.30	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-side-tuning-for-document/document-image-classification-on-tobacco-3482)](https://paperswithcode.com/sota/document-image-classification-on-tobacco-3482?p=multimodal-side-tuning-for-document)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-side-tuning-for-document/document-image-classification-on-rvl-cdip)](https://paperswithcode.com/sota/document-image-classification-on-rvl-cdip?p=multimodal-side-tuning-for-document)`

Multimodal Side-Tuning for Document Classification

16 Jan 2023 · Stefano Pio Zingaro, Giuseppe Lisanti, Maurizio Gabbrielli ·

In this paper, we propose to exploit the side-tuning framework for multimodal document classification. Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches. Thanks to this technique it is actually possible to overcome model rigidity and catastrophic forgetting of transfer learning by fine-tuning. The proposed solution uses off-the-shelf deep learning architectures leveraging the side-tuning framework to combine a base model with a tandem of two side networks. We show that side-tuning can be successfully employed also when different data sources are considered, e.g. text and images in document classification. The experimental results show that this approach pushes further the limit for document classification accuracy with respect to the state of the art.

PDF Abstract

Code

Add Remove Mark official

thezingaro/multimodal-side-tuning official

Tasks

Add Remove

Classification

Document Classification

Document Image Classification

Transfer Learning

Datasets

RVL-CDIP Tobacco-3482

Results from the Paper

Edit

Ranked #4 on Document Image Classification on Tobacco-3482

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Image Classification	RVL-CDIP	Multimodal (ResNet50)	Accuracy	92.7%	# 20	Compare
Document Image Classification	RVL-CDIP	Multimodal (ResNet50)	Parameters	57M	# 14	Compare
Document Image Classification	RVL-CDIP	Multimodal (MobileNetV2)	Accuracy	92.2%	# 24	Compare
Document Image Classification	RVL-CDIP	Multimodal (MobileNetV2)	Parameters	12M	# 12	Compare
Document Image Classification	Tobacco-3482	Multimodal Side-Tuning (MobileNetV2)	Accuracy	90.50	# 4	Compare
Document Image Classification	Tobacco-3482	Multimodal Side-Tuning (ResNet50)	Accuracy	90.30	# 5	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Depthwise Convolution • Depthwise Separable Convolution • fastText • Global Average Pooling • Inverted Residual Block • Kaiming Initialization • Max Pooling • MobileNetV2 • Pointwise Convolution • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Multimodal Side-Tuning for Document Classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove