no code implementations • 18 May 2023 • Won Jang, Dan Lim, Heayoung Park
This paper presents FastFit, a novel neural vocoder architecture that replaces the U-Net encoder with multiple short-time Fourier transforms (STFTs) to achieve faster generation rates without sacrificing sample quality.
2 code implementations • 31 Mar 2022 • Dan Lim, Sunghee Jung, Eesung Kim
In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech.
6 code implementations • 15 Jun 2021 • Won Jang, Dan Lim, Jaesam Yoon, BongWan Kim, Juntae Kim
Using full-band mel-spectrograms as input, we expect to generate high-resolution signals by adding a discriminator that employs spectrograms of multiple resolutions as the input.
2 code implementations • 19 Nov 2020 • Won Jang, Dan Lim, Jaesam Yoon
To preserve sound quality when the MelGAN-based structure is trained with a dataset of hundreds of speakers, we added multi-resolution spectrogram discriminators to sharpen the spectral resolution of the generated waveforms.
no code implementations • 15 May 2020 • Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bong-Wan Kim, Jaesam Yoon
We propose Jointly trained Duration Informed Transformer (JDI-T), a feed-forward Transformer with a duration predictor jointly trained without explicit alignments in order to generate an acoustic feature sequence from an input text.
no code implementations • 12 Oct 2017 • Dan Lim
This thesis introduces the sequence to sequence model with Luong's attention mechanism for end-to-end ASR.