Search Results for author: Dan Lim

Found 6 papers, 3 papers with code

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

no code implementations • 18 May 2023 • Won Jang, Dan Lim, Heayoung Park

This paper presents FastFit, a novel neural vocoder architecture that replaces the U-Net encoder with multiple short-time Fourier transforms (STFTs) to achieve faster generation rates without sacrificing sample quality.

Decoder

Paper
Add Code

JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech

2 code implementations • 31 Mar 2022 • Dan Lim, Sunghee Jung, Eesung Kim

In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech.

140

Paper
Code

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

6 code implementations • 15 Jun 2021 • Won Jang, Dan Lim, Jaesam Yoon, BongWan Kim, Juntae Kim

Using full-band mel-spectrograms as input, we expect to generate high-resolution signals by adding a discriminator that employs spectrograms of multiple resolutions as the input.

Speech Synthesis

29,953

Paper
Code

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

2 code implementations • 19 Nov 2020 • Won Jang, Dan Lim, Jaesam Yoon

To preserve sound quality when the MelGAN-based structure is trained with a dataset of hundreds of speakers, we added multi-resolution spectrogram discriminators to sharpen the spectral resolution of the generated waveforms.

223

Paper
Code

JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

no code implementations • 15 May 2020 • Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bong-Wan Kim, Jaesam Yoon

We propose Jointly trained Duration Informed Transformer (JDI-T), a feed-forward Transformer with a duration predictor jointly trained without explicit alignments in order to generate an acoustic feature sequence from an input text.

Paper
Add Code

Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR

no code implementations • 12 Oct 2017 • Dan Lim

This thesis introduces the sequence to sequence model with Luong's attention mechanism for end-to-end ASR.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.