no code implementations • 9 May 2024 • Adian Liusie, Vatsal Raina, Yassir Fathullah, Mark Gales
When Gaussian experts are used one can derive simple closed-form solutions for the optimal candidate ranking, as well as expressions for selecting which comparisons should be made to maximize the probability of this ranking.
no code implementations • 1 May 2024 • Yassir Fathullah, Mark J. F. Gales
Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks.
no code implementations • 20 Mar 2024 • Adian Liusie, Yassir Fathullah, Mark J. F. Gales
Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities and versatility in NLP tasks, however they sometimes fail to maintain crucial invariances for specific tasks.
no code implementations • 12 Nov 2023 • Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data.
no code implementations • 19 Sep 2023 • Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen
Overall, we demonstrate that by only adding a handful number of trainable parameters via adapters, we can unlock contextualized speech recognition capability for the pretrained LLM while keeping the same text-only input functionality.
no code implementations • 5 Sep 2023 • Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra
Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Jul 2023 • Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings.
Abstractive Text Summarization Automatic Speech Recognition +3
1 code implementation • 8 Jun 2023 • Potsawee Manakul, Yassir Fathullah, Adian Liusie, Vyas Raina, Vatsal Raina, Mark Gales
In this paper, we consider the challenge of summarizing patients' medical progress notes in a limited data setting.
no code implementations • 21 May 2023 • Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales
State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches.
Ranked #8 on Speech Recognition on LibriSpeech test-clean
no code implementations • 17 May 2023 • Yassir Fathullah, Guoxuan Xia, Mark Gales
Efficiently and reliably estimating uncertainty is an important objective in deep learning.
no code implementations • 9 May 2023 • Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales
In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding?
no code implementations • 15 Mar 2022 • Yassir Fathullah, Mark J. F. Gales
Furthermore it is possible to build ensembles of these models and apply hierarchical ensemble distillation approaches.
no code implementations • ACL 2021 • Puria Radmard, Yassir Fathullah, Aldo Lipani
Active Learning (AL) has been successfully applied to Deep Learning in order to drastically reduce the amount of data required to achieve high performance.
no code implementations • 24 Nov 2020 • Yassir Fathullah, Mark Gales, Andrey Malinin
It is, however, more challenging than the standard tasks investigated for distillation as the prediction of any grammatical correction to a word will be highly dependent on both the input sequence and the generated output history for the word.
no code implementations • 10 Nov 2019 • Yassir Fathullah, Chao Zhang, Philip C. Woodland
Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers.