Search Results for author: Yui Sudo

Found 10 papers, 2 papers with code

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

no code implementations • 5 Jun 2024 • Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe

Experimental results demonstrate that the jointly trained 4D model outperforms the E2E-ASR models trained with only one individual decoder.

Paper
Add Code

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

no code implementations • 22 May 2024 • Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe

End-to-end (E2E) automatic speech recognition (ASR) can operate in two modes: streaming and non-streaming, each with its pros and cons.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Contextualized Automatic Speech Recognition with Dynamic Vocabulary

no code implementations • 22 May 2024 • Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe

Deep biasing (DB) improves the performance of end-to-end automatic speech recognition (E2E-ASR) for rare words or contextual phrases using a bias list.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

no code implementations • 20 Feb 2024 • Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe

Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

no code implementations • 30 Jan 2024 • Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

In this work, we aim to improve the performance and efficiency of OWSM without extra training data.

Paper
Add Code

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search

no code implementations • 19 Jan 2024 • Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe

The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

1 code implementation • 25 Sep 2023 • Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Pre-training speech models on large volumes of data has achieved remarkable success.

Speech Recognition Translation

7,997

Paper
Code

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

no code implementations • 29 May 2023 • Yui Sudo, Kazuya Hata, Kazuhiro Nakadai

End-to-end automatic speech recognition (E2E-ASR) has the potential to improve performance, but a specific issue that needs to be addressed is the difficulty it has in handling enharmonic words: named entities (NEs) with the same pronunciation and part of speech that are spelled differently.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

1 code implementation • 28 May 2023 • Yifan Peng, Yui Sudo, Shakeel Muhammad, Shinji Watanabe

Knowledge distillation trains a small student model to mimic the behavior of a large teacher model.

Knowledge Distillation Self-Supervised Learning

Paper
Code

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

no code implementations • 21 Dec 2022 • Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe

The network architecture of end-to-end (E2E) automatic speech recognition (ASR) can be classified into several models, including connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention mechanism, and non-autoregressive mask-predict models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.