no code implementations • ICCV 2023 • Huaiwen Zhang, Zihang Guo, Yang Yang, Xin Liu, De Hu
In this paper, we propose a Cross-modal Contextualized Sequence Transduction (C2ST) for CSLR, which effectively incorporates the knowledge of gloss sequence into the process of video representation learning and sequence transduction.
no code implementations • 27 Oct 2022 • Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li
Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1).