Search Results for author: Jesus Villalba

Found 9 papers, 3 papers with code

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model

1 code implementation • 18 Jun 2023 • Helin Wang, Thomas Thebaud, Jesus Villalba, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velazquez

We present a novel typical-to-atypical voice conversion approach (DuTa-VC), which (i) can be trained with nonparallel data (ii) first introduces diffusion probabilistic model (iii) preserves the target speaker identity (iv) is aware of the phoneme duration of the target speaker.

Data Augmentation Decoder +3

Paper
Code

Stabilized training of joint energy-based models and their practical applications

no code implementations • 7 Mar 2023 • Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak

The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).

Paper
Add Code

Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation.

Representation Learning Speaker Identification

Paper
Add Code

The JHU submission to VoxSRC-21: Track 3

no code implementations • 28 Sep 2021 • Jejin Cho, Jesus Villalba, Najim Dehak

This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed).

Clustering Contrastive Learning +2

Paper
Add Code

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations • 31 Mar 2021 • Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition +3

Paper
Add Code

Learning Speaker Embedding from Text-to-Speech

1 code implementation • 21 Oct 2020 • Jaejin Cho, Piotr Zelasko, Jesus Villalba, Shinji Watanabe, Najim Dehak

TTS with speaker classification loss improved EER by 0. 28\% and 0. 73\% absolutely from a model using only speaker classification loss in LibriTTS and Voxceleb1 respectively.

Classification Decoder +3

Paper
Code

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

no code implementations • 12 Feb 2020 • Raghavendra Pappagari, Tianzi Wang, Jesus Villalba, Nanxin Chen, Najim Dehak

Then, we show the effect of emotion on speaker recognition.

Emotion Classification Emotion Recognition +3

Paper
Add Code

Speaker detection in the wild: Lessons learned from JSALT 2019

1 code implementation • 2 Dec 2019 • Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios.

Audio and Speech Processing Sound

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.