no code implementations • 5 Mar 2024 • Chan-Jan Hsu, Chang-Le Liu, Feng-Ting Liao, Po-chun Hsu, Yi-Chang Chen, Da-Shan Shiu
Breeze-7B is an open-source language model based on Mistral-7B, designed to address the need for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese.
no code implementations • 29 Sep 2023 • Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-Yi Lee, Abdelrahman Mohamed
Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks.
1 code implementation • 15 Sep 2023 • Chan-Jan Hsu, Chang-Le Liu, Feng-Ting Liao, Po-chun Hsu, Yi-Chang Chen, Da-Shan Shiu
In an effort to advance the evaluation of language models in Traditional Chinese and stimulate further research in this field, we have open-sourced our benchmark and opened the model for trial.
no code implementations • 25 Apr 2023 • Po-chun Hsu, Li-Hsiang Shen, Chun-Hung Liu, Kai-Ten Feng
Terahertz (THz) communication with ultra-wide available spectrum is a promising technique that can achieve the stringent requirement of high data rate in the next-generation wireless networks, yet its severe propagation attenuation significantly hinders its implementation in practice.
1 code implementation • 8 Mar 2023 • Philipp Ennen, Po-chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yen-chen Wu, Yin-Hsiang Liao, Chin-Tung Lin, Da-Shan Shiu, Wei-Yun Ma
In this paper we present the multilingual language model BLOOM-zh that features enhanced support for Traditional Chinese.
no code implementations • 29 Jul 2022 • Da-Rong Liu, Po-chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-Yi Lee
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
1 code implementation • 29 Jun 2022 • Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed
Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 8 May 2022 • Chi-Luen Feng, Po-chun Hsu, Hung-Yi Lee
We found that HuBERT stores speaker information in representations whose positions correspond to silences in a waveform.
1 code implementation • 1 Apr 2022 • Fan-Lin Wang, Po-chun Hsu, Da-Rong Liu, Hung-Yi Lee
Most recent speech synthesis systems are composed of a synthesizer and a vocoder.
1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee
We also show that the neural vocoder adopted in the detection framework is dataset-independent.
1 code implementation • 6 Mar 2021 • Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, Po-chun Hsu, Hung-Yi Lee
The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.
1 code implementation • 15 May 2020 • Po-chun Hsu, Hung-Yi Lee
As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform.
Speech Synthesis Text-To-Speech Synthesis Audio and Speech Processing Sound
no code implementations • 5 Dec 2019 • Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-Yi Lee
We found out that the speaker variety is much more important for achieving a universal vocoder than the language.
7 code implementations • 25 Oct 2019 • Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-Yi Lee
We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.
1 code implementation • 28 May 2019 • Andy T. Liu, Po-chun Hsu, Hung-Yi Lee
We found that the proposed encoding method offers automatic extraction of speech content from speaker style, and is sufficient to cover full linguistic content in a given language.
1 code implementation • 9 Aug 2018 • Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-Yi Lee, Lin-shan Lee
In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data.
Sound Audio and Speech Processing