1 code implementation • 20 Sep 2022 • Timo Lohrenz, Björn Möller, Zhengyang Li, Tim Fingscheidt
The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive transformer decoder complicating the integration of external language models.
Ranked #4 on Lipreading on LRS3-TED (using extra training data)
1 code implementation • 2 Jul 2021 • Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt
Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks.
Ranked #7 on Speech Recognition on WSJ eval92
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Mar 2021 • Timo Lohrenz, Zhengyang Li, Tim Fingscheidt
Stream fusion, also known as system combination, is a common technique in automatic speech recognition for traditional hybrid hidden Markov model approaches, yet mostly unexplored for modern deep neural network end-to-end model architectures.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3