no code implementations • 29 Apr 2024 • Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li
To this end, we propose a novel reverse selective auditory attention mechanism, which can suppress interference speakers and non-speech signals to avoid incorrect speaker extraction.
no code implementations • 1 Apr 2024 • Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li
Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.
no code implementations • 26 Dec 2023 • Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng
This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.
no code implementations • 23 Oct 2023 • Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li
We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal.
no code implementations • 28 Aug 2023 • Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li
\textit{Conclusion:} We conclude that it is possible to derive the attended speaker's voice signature from the EEG signals so as to detect the attended speaker in a listening brain.
1 code implementation • 22 May 2023 • Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li
To benefit from both facial cue and reference speech, we propose the Target Speaker TalkNet (TS-TalkNet), which leverages a pre-enrolled speaker embedding to complement the audio-visual synchronization cue in detecting whether the target speaker is speaking.
3 code implementations • CVPR 2023 • Jiawei Du, Yidi Jiang, Vincent Y. F. Tan, Joey Tianyi Zhou, Haizhou Li
To mitigate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
1 code implementation • 5 Aug 2021 • Yidi Jiang, Bidisha Sharma, Maulik Madhavi, Haizhou Li
In this regard, we leverage the reliable and widely used bidirectional encoder representations from transformers (BERT) model as a language model and transfer the knowledge to build an acoustic model for intent classification using the speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7