1 code implementation • 13 Oct 2023 • Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg
We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Aug 2022 • Andrei Andrusenko, Rauf Nasretdinov, Aleksei Romanenko
Optimization of modern ASR architectures is among the highest priority tasks since it saves many computational resources for model training and inference.
1 code implementation • 6 Apr 2021 • Anton Mitrofanov, Mariya Korenevskaya, Ivan Podluzhny, Yuri Khokhlov, Aleksandr Laptev, Andrei Andrusenko, Aleksei Ilin, Maxim Korenevsky, Ivan Medennikov, Aleksei Romanenko
We propose a novel rescoring approach, which processes the entire lattice in a single call to the model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Mar 2021 • Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov, Ivan Medennikov, Yuri Matveev
Researchers and industry prefer to use end-to-end ASR systems for on-device speech recognition tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Jun 2020 • Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov
This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 14 May 2020 • Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin
Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 14 May 2020 • Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko
We propose a novel Target-Speaker Voice Activity Detection (TS-VAD) approach, which directly predicts an activity of each speaker on each time frame.
1 code implementation • 22 Apr 2020 • Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov
To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech.
Ranked #4 on Speech Recognition on CHiME-6 dev_gss12