no code implementations • 1 Mar 2024 • Mufan Sang, John H. L. Hansen
With excellent generalization ability, self-supervised speech models have shown impressive performance on various downstream speech tasks in the pre-training and fine-tuning paradigm.
no code implementations • 17 Feb 2023 • Mufan Sang, Yong Zhao, Gang Liu, John H. L. Hansen, Jian Wu
The proposed models achieve 0. 75% EER on VoxCeleb 1 test set, outperforming the previously proposed Transformer-based models and CNN-based models, such as ResNet34 and ECAPA-TDNN.
no code implementations • 10 Jul 2022 • Mufan Sang, John H. L. Hansen
In this study, we show that GAP is a special case of a discrete cosine transform (DCT) on time-frequency domain mathematically using only the lowest frequency component in frequency decomposition.
no code implementations • 8 Dec 2021 • Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan
With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker representation learning with a simple Siamese network architecture.
no code implementations • 12 Dec 2020 • Mufan Sang, Wei Xia, John H. L. Hansen
Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field.
no code implementations • 21 Sep 2020 • Mufan Sang, Wei Xia, John H. L. Hansen
In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available.