TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text based Person Retrieval	CUHK-PEDES	VGSG (ViT-Base)	R@1	71.38	# 6
Text based Person Retrieval	CUHK-PEDES	VGSG (ViT-Base)	R@10	91.86	# 6
Text based Person Retrieval	CUHK-PEDES	VGSG (ViT-Base)	R@5	86.75	# 6
Text based Person Retrieval	CUHK-PEDES	VGSG (ViT-Base)	mAP	67.91	# 3
Text based Person Retrieval	ICFG-PEDES	VGSG (ViT-Base)	R@1	63.05	# 7
Text based Person Retrieval	ICFG-PEDES	VGSG (ViT-Base)	R@5	78.43	# 6
Text based Person Retrieval	ICFG-PEDES	VGSG (ViT-Base)	R@10	84.36	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vgsg-vision-guided-semantic-group-network-for/nlp-based-person-retrival-on-cuhk-pedes)](https://paperswithcode.com/sota/nlp-based-person-retrival-on-cuhk-pedes?p=vgsg-vision-guided-semantic-group-network-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vgsg-vision-guided-semantic-group-network-for/text-based-person-retrieval-on-icfg-pedes)](https://paperswithcode.com/sota/text-based-person-retrieval-on-icfg-pedes?p=vgsg-vision-guided-semantic-group-network-for)`

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

13 Nov 2023 · Shuting He, Hao Luo, Wei Jiang, Xudong Jiang, Henghui Ding ·

Text-based Person Search (TBPS) aims to retrieve images of target pedestrian indicated by textual descriptions. It is essential for TBPS to extract fine-grained local features and align them crossing modality. Existing methods utilize external tools or heavy cross-modal interaction to achieve explicit alignment of cross-modal fine-grained features, which is inefficient and time-consuming. In this work, we propose a Vision-Guided Semantic-Group Network (VGSG) for text-based person search to extract well-aligned fine-grained visual and textual features. In the proposed VGSG, we develop a Semantic-Group Textual Learning (SGTL) module and a Vision-guided Knowledge Transfer (VGKT) module to extract textual local features under the guidance of visual local clues. In SGTL, in order to obtain the local textual representation, we group textual features from the channel dimension based on the semantic cues of language expression, which encourages similar semantic patterns to be grouped implicitly without external tools. In VGKT, a vision-guided attention is employed to extract visual-related textual features, which are inherently aligned with visual cues and termed vision-guided textual features. Furthermore, we design a relational knowledge transfer, including a vision-language similarity transfer and a class probability transfer, to adaptively propagate information of the vision-guided textual features to semantic-group textual features. With the help of relational knowledge transfer, VGKT is capable of aligning semantic-group textual features with corresponding visual features without external tools and complex pairwise interaction. Experimental results on two challenging benchmarks demonstrate its superiority over state-of-the-art methods.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Person Search

Text based Person Retrieval

Text based Person Search

Transfer Learning

Datasets

CUHK-PEDES ICFG-PEDES

Results from the Paper

Edit

Ranked #6 on Text based Person Retrieval on CUHK-PEDES (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text based Person Retrieval	CUHK-PEDES	VGSG (ViT-Base)	R@1	71.38	# 6	Compare
			R@10	91.86	# 6	Compare
			R@5	86.75	# 6	Compare
			mAP	67.91	# 3	Compare
Text based Person Retrieval	ICFG-PEDES	VGSG (ViT-Base)	R@1	63.05	# 7	Compare
			R@5	78.43	# 6	Compare
			R@10	84.36	# 6	Compare

Methods

Add Remove

ALIGN

Edit Social Preview

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove