Search Results for author: Xuri Ge

Found 11 papers, 2 papers with code

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

1 code implementation • 26 Apr 2024 • Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval.

Ranked #1 on Cross-Modal Retrieval on MSCOCO

Cross-Modal Retrieval Retrieval +1

Paper
Code

IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT

2 code implementations • 2 Apr 2024 • Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Jie Wang, Joemon M. Jose

This is also a notable improvement over the Adapter and LoRA, which require 37-39 GB GPU memory and 350-380 seconds per epoch for training.

Representation Learning Sequential Recommendation

Paper
Code

CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora

no code implementations • 23 Feb 2024 • Zijun Long, Xuri Ge, Richard McCreadie, Joemon Jose

Text-to-image retrieval aims to find the relevant images based on a text query, which is important in various use-cases, such as digital libraries, e-commerce, and multimedia databases.

Computational Efficiency Image Retrieval +2

Paper
Add Code

The Relationship Between Speech Features Changes When You Get Depressed: Feature Correlations for Improving Speed and Performance of Depression Detection

no code implementations • 6 Jul 2023 • Fuxiang Tao, Wei Ma, Xuri Ge, Anna Esposito, Alessandro Vinciarelli

The results show that the models used in the experiments improve in terms of training speed and performance when fed with feature correlation matrices rather than with feature vectors.

Depression Detection Feature Correlation

Paper
Add Code

Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval

no code implementations • 17 Oct 2022 • Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Joemon M. Jose

To correlate the context of objects with the textual context, we further refine the visual semantic representation via the cross-level object-sentence and word-image based interactive attention.

Object Retrieval +1

Paper
Add Code

MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection

no code implementations • 4 Apr 2022 • Xuri Ge, Joemon M. Jose, Songpei Xu, Xiao Liu, Hu Han

While the region-level feature learning from local face patches features via graph neural network can encode the correlation across different AUs, the pixel-wise and channel-wise feature learning via graph attention network can enhance the discrimination ability of AU features from global face features.

Graph Attention Relational Reasoning

Paper
Add Code

Differentiated Relevances Embedding for Group-based Referring Expression Comprehension

no code implementations • 12 Mar 2022 • Fuhai Chen, Xuri Ge, Xiaoshuai Sun, Yue Gao, Jianzhuang Liu, Fufeng Chen, Wenjie Li

The key of referring expression comprehension lies in capturing the cross-modal visual-linguistic relevance.

Attribute Object +2

Paper
Add Code

Factored Attention and Embedding for Unstructured-view Topic-related Ultrasound Report Generation

no code implementations • 12 Mar 2022 • Fuhai Chen, Rongrong Ji, Chengpeng Dai, Xuri Ge, Shengchuang Zhang, Xiaojing Ma, Yue Gao

Echocardiography is widely used to clinical practice for diagnosis and treatment, e. g., on the common congenital heart defects.

Decision Making Medical Report Generation

Paper
Add Code

Automatic Facial Paralysis Estimation with Facial Action Units

no code implementations • 3 Mar 2022 • Xuri Ge, Joemon M. Jose, Pengcheng Wang, Arunachalam Iyer, Xiao Liu, Hu Han

In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU detection and use it to classify facial paralysis severity.

Paper
Add Code

Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval

no code implementations • 5 Aug 2021 • Xuri Ge, Fuhai Chen, Joemon M. Jose, Zhilong Ji, Zhongqin Wu, Xiao Liu

In this work, we propose to address the above issue from two aspects: (i) constructing intrinsic structure (along with relations) among the fragments of respective modalities, e. g., "dog $\to$ play $\to$ ball" in semantic structure for an image, and (ii) seeking explicit inter-modal structural and semantic correspondence between the visual and textual modalities.

Retrieval Semantic correspondence +1

Paper
Add Code

Variational Structured Semantic Inference for Diverse Image Captioning

no code implementations • NeurIPS 2019 • Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang

To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.

Decoder Image Captioning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.