Search Results for author: Yui Sudo

Found 10 papers, 2 papers with code

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

no code implementations5 Jun 2024 Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe

Experimental results demonstrate that the jointly trained 4D model outperforms the E2E-ASR models trained with only one individual decoder.

Contextualized Automatic Speech Recognition with Dynamic Vocabulary

no code implementations22 May 2024 Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe

Deep biasing (DB) improves the performance of end-to-end automatic speech recognition (E2E-ASR) for rare words or contextual phrases using a bias list.

Automatic Speech Recognition speech-recognition +1

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

no code implementations20 Feb 2024 Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe

Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search

no code implementations19 Jan 2024 Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe

The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

no code implementations29 May 2023 Yui Sudo, Kazuya Hata, Kazuhiro Nakadai

End-to-end automatic speech recognition (E2E-ASR) has the potential to improve performance, but a specific issue that needs to be addressed is the difficulty it has in handling enharmonic words: named entities (NEs) with the same pronunciation and part of speech that are spelled differently.

Automatic Speech Recognition speech-recognition +1

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

no code implementations21 Dec 2022 Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe

The network architecture of end-to-end (E2E) automatic speech recognition (ASR) can be classified into several models, including connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention mechanism, and non-autoregressive mask-predict models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.