1 code implementation • 8 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 5 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 5 Oct 2021 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 24 Feb 2021 • Morimichi Kawasaki, Mitsuaki Kimura, Takahiro Matsushita, Masato Mimura
Let $(S,\omega)$ be a closed connected oriented surface whose genus $l$ is at least two equipped with a symplectic form.
Symplectic Geometry Group Theory Geometric Topology Primary 20F12, 20J05, 37E35, 53D35, 70H15, Secondary 20F36, 37A15, 37J05, 37J10, 57R17, 53D22
no code implementations • 31 Dec 2020 • Wataru Kai, Masato Mimura, Akihiro Munemasa, Shin-ichiro Seki, Kiyoto Yoshino
Given any number field, we prove that there exist arbitrarily shaped constellations consisting of pairwise non-associate prime elements of the ring of integers.
Number Theory Combinatorics 11B30 (Primary) 11B25, 11H55, 11N05, 11R04, 05C55 (Secondary)
1 code implementation • 27 Aug 2020 • Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara
The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings.
Audio and Speech Processing
1 code implementation • 9 Aug 2020 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 5 Jul 2020 • Morimichi Kawasaki, Mitsuaki Kimura, Takahiro Matsushita, Masato Mimura
The goal in this paper is to establish Bavard's duality theorem of $G$-invariant quasimorphisms, which was previously proved by Kawasaki and Kimura in the case $N = [G, N]$.
Group Theory Algebraic Topology Geometric Topology
1 code implementation • 19 May 2020 • Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 May 2020 • Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 10 May 2020 • Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • LREC 2020 • Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 22 Sep 2019 • Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 Mar 2019 • Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Oct 2017 • Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech.