Search Results for author: Aleksandr Laptev

Found 12 papers, 2 papers with code

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.

Automatic Speech Recognition speaker-diarization +3

Paper
Add Code

Confidence-based Ensembles of End-to-End Speech Recognition Models

no code implementations • 27 Jun 2023 • Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg

Second, we demonstrate that it is possible to combine base and adapted models to achieve strong results on both original and target data.

Language Identification Model Selection +2

Paper
Add Code

Powerful and Extensible WFST Framework for RNN-Transducer Losses

no code implementations • 18 Mar 2023 • Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss.

Paper
Add Code

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition

no code implementations • 16 Dec 2022 • Aleksandr Laptev, Boris Ginsburg

This paper presents a class of new fast non-trainable entropy-based confidence estimation methods for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

CTC Variations Through New WFST Topologies

no code implementations • 6 Oct 2021 • Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg

This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

1 code implementation • 6 Apr 2021 • Anton Mitrofanov, Mariya Korenevskaya, Ivan Podluzhny, Yuri Khokhlov, Aleksandr Laptev, Andrei Andrusenko, Aleksei Ilin, Maxim Korenevsky, Ivan Medennikov, Aleksei Romanenko

We propose a novel rescoring approach, which processes the entire lattice in a single call to the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

no code implementations • 12 Mar 2021 • Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov, Ivan Medennikov, Yuri Matveev

Researchers and industry prefer to use end-to-end ASR systems for on-device speech recognition tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset

no code implementations • 15 Jun 2020 • Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov

This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

no code implementations • 14 May 2020 • Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin

Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

no code implementations • 14 May 2020 • Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko

We propose a novel Target-Speaker Voice Activity Detection (TS-VAD) approach, which directly predicts an activity of each speaker on each time frame.

Action Detection Activity Detection +4

Paper
Add Code

Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

1 code implementation • 22 Apr 2020 • Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov

To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech.

Ranked #4 on Speech Recognition on CHiME-6 dev_gss12

Data Augmentation Speech Enhancement +2

7,941

Paper
Code

Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

no code implementations • 19 Mar 2020 • Nikolay Malkovsky, Vladimir Bataev, Dmitrii Sviridkin, Natalia Kizhaeva, Aleksandr Laptev, Ildar Valiev, Oleg Petrov

The problem of out of vocabulary words (OOV) is typical for any speech recognition system, hybrid systems are usually constructed to recognize a fixed set of words and rarely can include all the words that will be encountered during exploitation of the system.

graph construction speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.