no code implementations • 18 May 2023 • Won Jang, Dan Lim, Heayoung Park
This paper presents FastFit, a novel neural vocoder architecture that replaces the U-Net encoder with multiple short-time Fourier transforms (STFTs) to achieve faster generation rates without sacrificing sample quality.
no code implementations • 15 May 2020 • Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bong-Wan Kim, Jaesam Yoon
We propose Jointly trained Duration Informed Transformer (JDI-T), a feed-forward Transformer with a duration predictor jointly trained without explicit alignments in order to generate an acoustic feature sequence from an input text.