no code implementations • 6 Feb 2024 • Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-Yi Lee, Lin-shan Lee, Shao-Hua Sun
A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 Sep 2023 • Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee
Audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information.
1 code implementation • 15 Mar 2023 • Yuan Tseng, Cheng-I Lai, Hung-Yi Lee
The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a constituent.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 13 Oct 2022 • Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-Yi Lee, Nigel G. Ward
We find that 13 of the 15 SSL models outperformed the baseline on all the prosody-related tasks.