Search Results for author: Yidi Jiang

Found 8 papers, 3 papers with code

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

no code implementations • 29 Apr 2024 • Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li

To this end, we propose a novel reverse selective auditory attention mechanism, which can suppress interference speakers and non-speech signals to avoid incorrect speaker extraction.

Target Speaker Extraction

Paper
Add Code

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

no code implementations • 1 Apr 2024 • Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li

Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.

Speaker Recognition Voice Conversion

Paper
Add Code

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

no code implementations • 26 Dec 2023 • Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.

Automatic Speech Recognition Data Augmentation +2

Paper
Add Code

Prompt-driven Target Speech Diarization

no code implementations • 23 Oct 2023 • Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li

We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal.

Action Detection Activity Detection

Paper
Add Code

EEG-Derived Voice Signature for Attended Speaker Detection

no code implementations • 28 Aug 2023 • Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li

\textit{Conclusion:} We conclude that it is possible to derive the attended speaker's voice signature from the EEG signals so as to detect the attended speaker in a listening brain.

EEG

Paper
Add Code

Target Active Speaker Detection with Audio-visual Cues

1 code implementation • 22 May 2023 • Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li

To benefit from both facial cue and reference speech, we propose the Target Speaker TalkNet (TS-TalkNet), which leverages a pre-enrolled speaker embedding to complement the audio-visual synchronization cue in detecting whether the target speaker is speaking.

Audio-Visual Synchronization

Paper
Code

Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation

3 code implementations • CVPR 2023 • Jiawei Du, Yidi Jiang, Vincent Y. F. Tan, Joey Tianyi Zhou, Haizhou Li

To mitigate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.

Neural Architecture Search

1,185

Paper
Code

Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification

1 code implementation • 5 Aug 2021 • Yidi Jiang, Bidisha Sharma, Maulik Madhavi, Haizhou Li

In this regard, we leverage the reliable and widely used bidirectional encoder representations from transformers (BERT) model as a language model and transfer the knowledge to build an acoustic model for intent classification using the speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.