no code implementations • 27 Nov 2023 • Nianwen Si, Hao Zhang, Heyu Chang, Wenlin Zhang, Dan Qu, WeiQiang Zhang
We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.
no code implementations • 3 Oct 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Xiaolin Jiao
The training of LST consists of two stages: (1) Modality adjustment, where the adapter is tuned to align speech representation with text embedding space, and (2) Downstream task fine-tuning, where both the adapter and LLM model are trained to optimize performance on the E2EST task.
no code implementations • 20 Apr 2023 • Hao Zhang, Dan Qu, Keji Shao, Xukui Yang
In contrast to the general dropout method, which randomly drops neurons, DropDim drops part of the embedding dimensions.
no code implementations • 20 Apr 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Wei-Qiang Zhang
However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited.
no code implementations • 20 Apr 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Zhen Li
Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training.