no code implementations • ACL (IWSLT) 2021 • Dirk Padfield, Colin Cherry
Traditional translation systems trained on written documents perform well for text-based translation but not as well for speech-based applications.
no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank
AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.
1 code implementation • 19 May 2023 • Hua Shen, Vicky Zayats, Johann C. Rocholl, Daniel D. Walker, Dirk Padfield
Current disfluency detection models focus on individual utterances each from a single speaker.
no code implementations • 5 Aug 2022 • Dirk Padfield, Daniel J. Liebling
Diarization partitions an audio stream into segments based on the voices of the speakers.
no code implementations • NAACL 2022 • Angelica Chen, Vicky Zayats, Daniel D. Walker, Dirk Padfield
In modern interactive speech-based systems, speech is consumed and transcribed incrementally prior to having disfluencies removed.
no code implementations • EMNLP 2021 • Katrin Tomanek, Vicky Zayats, Dirk Padfield, Kara Vaillancourt, Fadi Biadsy
We demonstrate this on two speech adaptation tasks (atypical and accented speech) and for two state-of-the-art ASR architectures.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 21 Oct 2020 • Daniel Li, Te I, Naveen Arivazhagan, Colin Cherry, Dirk Padfield
Specifically, in the context of long-form speech translation systems, where the input transcripts come from Automatic Speech Recognition (ASR), the NMT models have to handle errors including phoneme substitutions, grammatical structure, and sentence boundaries, all of which pose challenges to NMT robustness.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7