Search Results for author: Luo Zhong

Found 6 papers, 2 papers with code

Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering

1 code implementation • 22 Oct 2021 • Zhongwei Xie, Ling Liu, Yanzhao Wu, Luo Zhong, Lin Li

This paper introduces a two-phase deep feature engineering framework for efficient learning of semantics enhanced joint embedding, which clearly separates the deep feature engineering in data preprocessing from training the text-image joint embedding model.

Cross-Modal Retrieval Feature Engineering +1

Paper
Code

Visual-aware Attention Dual-stream Decoder for Video Captioning

no code implementations • 16 Oct 2021 • Zhixin Sun, Xian Zhong, Shuqin Chen, Lin Li, Luo Zhong

Video captioning is a challenging task that captures different visual parts and describes them in sentences, for it requires visual and linguistic coherence.

Decoder Video Captioning +1

Paper
Add Code

Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images

no code implementations • 9 Aug 2021 • Zhongwei Xie, Ling Liu, Lin Li, Luo Zhong

This paper presents a three-tier modality alignment approach to learning text-image joint embedding, coined as JEMA, for cross-modal retrieval of cooking recipes and food images.

Cross-Modal Retrieval Retrieval +1

Paper
Add Code

Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service

1 code implementation • 2 Aug 2021 • Zhongwei Xie, Ling Liu, Yanzhao Wu, Lin Li, Luo Zhong

We present a Multi-modal Semantics enhanced Joint Embedding approach (MSJE) for learning a common feature space between the two modalities (text and image), with the ultimate goal of providing high-performance cross-modal retrieval services.

Cross-Modal Retrieval Retrieval

Paper
Code

Efficient Deep Feature Calibration for Cross-Modal Joint Embedding Learning

no code implementations • 2 Aug 2021 • Zhongwei Xie, Ling Liu, Lin Li, Luo Zhong

This paper introduces a two-phase deep feature calibration framework for efficient learning of semantics enhanced text-image cross-modal joint embedding, which clearly separates the deep feature calibration in data preprocessing from training the joint embedding model.

Feature Engineering

Paper
Add Code

Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings

no code implementations • 4 Oct 2018 • Zhongwei Xie, Lin Li, Xian Zhong, Luo Zhong

In this paper, we propose an end-to-end neural network framework for image-to-video person reidentification by leveraging cross-modal embeddings learned from extra information. Concretely speaking, cross-modal embeddings from image captioning and video captioning models are reused to help learned features be projected into a coordinated space, where similarity can be directly computed.

Image Captioning Image-To-Video Person Re-Identification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.