TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Molecule Captioning	ChEBI-20	BioT5+	BLEU-2	66.6	# 1
Molecule Captioning	ChEBI-20	BioT5+	BLEU-4	59.1	# 1
Molecule Captioning	ChEBI-20	BioT5+	ROUGE-1	71.0	# 1
Molecule Captioning	ChEBI-20	BioT5+	ROUGE-2	58.4	# 1
Molecule Captioning	ChEBI-20	BioT5+	ROUGE-L	65.0	# 1
Molecule Captioning	ChEBI-20	BioT5+	METEOR	68.1	# 1
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Text2Mol	57.9	# 6
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	BLEU	87.2	# 1
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Exact Match	52.2	# 1
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Levenshtein	12.776	# 16
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	MACCS FTS	90.7	# 1
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	RDK FTS	83.5	# 1
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Morgan FTS	77.9	# 1
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Frechet ChemNet Distance (FCD)	0.353	# 5
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Validity	100	# 1
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Parameter Count	252000000	# 13
Retrosynthesis	Mol-Instruction	BioT5+	Exact	0.642	# 2
Retrosynthesis	Mol-Instruction	BioT5+	Validity	1	# 1
Retrosynthesis	Mol-Instruction	BioT5+	Morgan FTS	0.866	# 2
Reagent Prediction	Mol-Instruction	BioT5+	Exact	0.257	# 2
Reagent Prediction	Mol-Instruction	BioT5+	Validity	1	# 1
Reagent Prediction	Mol-Instruction	BioT5+	Morgan FTS	0.512	# 2
Forward reaction prediction	Mol-Instruction	BioT5+	Exact	0.864	# 2
Forward reaction prediction	Mol-Instruction	BioT5+	Validity	1	# 1
Forward reaction prediction	Mol-Instruction	BioT5+	Morgan FTS	0.935	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biot5-towards-generalized-biological/molecule-captioning-on-chebi-20)](https://paperswithcode.com/sota/molecule-captioning-on-chebi-20?p=biot5-towards-generalized-biological)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biot5-towards-generalized-biological/text-based-de-novo-molecule-generation-on)](https://paperswithcode.com/sota/text-based-de-novo-molecule-generation-on?p=biot5-towards-generalized-biological)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biot5-towards-generalized-biological/retrosynthesis-on-mol-instruction)](https://paperswithcode.com/sota/retrosynthesis-on-mol-instruction?p=biot5-towards-generalized-biological)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biot5-towards-generalized-biological/reagent-prediction-on-mol-instruction)](https://paperswithcode.com/sota/reagent-prediction-on-mol-instruction?p=biot5-towards-generalized-biological)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biot5-towards-generalized-biological/forward-reaction-prediction-on-mol)](https://paperswithcode.com/sota/forward-reaction-prediction-on-mol?p=biot5-towards-generalized-biological)`

BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

27 Feb 2024 · Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan ·

Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper introduces BioT5+, an extension of the BioT5 framework, tailored to enhance biological research and drug discovery. BioT5+ incorporates several novel features: integration of IUPAC names for molecular understanding, inclusion of extensive bio-text and molecule data from sources like bioRxiv and PubChem, the multi-task instruction tuning for generality across tasks, and a novel numerical tokenization technique for improved processing of numerical data. These enhancements allow BioT5+ to bridge the gap between molecular representations and their textual descriptions, providing a more holistic understanding of biological entities, and largely improving the grounded reasoning of bio-text and bio-sequences. The model is pre-trained and fine-tuned with a large number of experiments, including \emph{3 types of problems (classification, regression, generation), 15 kinds of tasks, and 21 total benchmark datasets}, demonstrating the remarkable performance and state-of-the-art results in most cases. BioT5+ stands out for its ability to capture intricate relationships in biological data, thereby contributing significantly to bioinformatics and computational biology. Our code is available at \url{https://github.com/QizhiPei/BioT5}.

PDF Abstract

Code

Add Remove Mark official

QizhiPei/BioT5 official

Tasks

Add Remove

Drug Discovery

Forward reaction prediction

Molecule Captioning

Reagent Prediction

Retrosynthesis

Text-based de novo Molecule Generation

Datasets

MoleculeNet

QM9

ChEBI-20

Results from the Paper

Add Remove

Ranked #1 on Molecule Captioning on ChEBI-20

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Molecule Captioning	ChEBI-20	BioT5+	BLEU-2	66.6	# 1	Compare
			BLEU-4	59.1	# 1	Compare
			ROUGE-1	71.0	# 1	Compare
			ROUGE-2	58.4	# 1	Compare
			ROUGE-L	65.0	# 1	Compare
			METEOR	68.1	# 1	Compare
Text-based de novo Molecule Generation	ChEBI-20	BioT5+	Text2Mol	57.9	# 6	Compare
			BLEU	87.2	# 1	Compare
			Exact Match	52.2	# 1	Compare
			Levenshtein	12.776	# 16	Compare
			MACCS FTS	90.7	# 1	Compare
			RDK FTS	83.5	# 1	Compare
			Morgan FTS	77.9	# 1	Compare
			Frechet ChemNet Distance (FCD)	0.353	# 5	Compare
			Validity	100	# 1	Compare
			Parameter Count	252000000	# 13	Compare
Retrosynthesis	Mol-Instruction	BioT5+	Exact	0.642	# 2	Compare
			Validity	1	# 1	Compare
			Morgan FTS	0.866	# 2	Compare
Reagent Prediction	Mol-Instruction	BioT5+	Exact	0.257	# 2	Compare
			Validity	1	# 1	Compare
			Morgan FTS	0.512	# 2	Compare
Forward reaction prediction	Mol-Instruction	BioT5+	Exact	0.864	# 2	Compare
			Validity	1	# 1	Compare
			Morgan FTS	0.935	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove