Search Results for author: Ye Bai

Found 19 papers, 2 papers with code

PolyVoice: Language Models for Speech to Speech Translation

no code implementations • 5 Jun 2023 • Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.

Language Modelling Speech Synthesis +2

Paper
Add Code

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

no code implementations • 17 Sep 2022 • Ye Bai, Jie Li, Wenjing Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang

Experimental results show that the proposed model achieves competitive performance with 1/3 of the parameters of the encoder, compared with the full-parameter model.

Knowledge Distillation speech-recognition +1

Paper
Add Code

ADD 2022: the First Audio Deep Synthesis Detection Challenge

no code implementations • 17 Feb 2022 • Jiangyan Yi, Ruibo Fu, JianHua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.

Audio Generation DeepFake Detection +1

Paper
Add Code

Continual Learning for Fake Audio Detection

no code implementations • 15 Apr 2021 • Haoxin Ma, Jiangyan Yi, JianHua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang

However, fine-tuning leads to performance degradation on previous data.

Continual Learning Knowledge Distillation +1

Paper
Add Code

Half-Truth: A Partially Fake Audio Detection Dataset

1 code implementation • 8 Apr 2021 • Jiangyan Yi, Ye Bai, JianHua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu

Therefore, this paper develops such a dataset for half-truth audio detection (HAD).

Speech Synthesis

Paper
Code

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

no code implementations • 7 Apr 2021 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence.

Decoder Position +2

Paper
Add Code

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

1 code implementation • 4 Apr 2021 • Zhengkun Tian, Jiangyan Yi, JianHua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu

To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model.

Decoder speech-recognition +2

Paper
Code

Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT

no code implementations • 15 Feb 2021 • Ye Bai, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Based on this idea, we propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once).

Decoder Language Modelling +4

Paper
Add Code

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

no code implementations • 28 Oct 2020 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, JianHua Tao, Zhengqi Wen

In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition

no code implementations • 28 Oct 2020 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder and the two-stage inference method into the streaming CTC model.

Decoder Language Modelling +2

Paper
Add Code

Deep imitator: Handwriting calligraphy imitation via deep attention networks

no code implementations • Pattern Recognition 2020 • Bocheng Zhao, JianHua Tao, Minghao Yang, Zhengkun Tian, Cunhang Fan, Ye Bai

Calligraphy imitation (CI) from a handful of target handwriting samples is such a challenging task that most of the existing writing style analysis or handwriting generation methods do not exhibit satisfactory performance.

Deep Attention Handwriting generation

Paper
Add Code

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

no code implementations • 16 May 2020 • Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen

To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence.

Machine Translation speech-recognition +2

Paper
Add Code

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

no code implementations • 11 May 2020 • Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Without beam-search, the one-pass propagation much reduces inference time cost of LASO.

Sentence speech-recognition +1

Paper
Add Code

Adversarial Transfer Learning for Punctuation Restoration

no code implementations • 1 Apr 2020 • Jiangyan Yi, Jian-Hua Tao, Ye Bai, Zhengkun Tian, Cunhang Fan

The other is that POS tags are provided by an external POS tagger.

Language Modelling Multi-Task Learning +4

Paper
Add Code

Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

no code implementations • 19 Feb 2020 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jian-Hua Tao, Ye Bai

Recently, language identity information has been utilized to improve the performance of end-to-end code-switching (CS) speech recognition.

Language Identification speech-recognition +1

Paper
Add Code

Synchronous Transformers for End-to-End Speech Recognition

no code implementations • 6 Dec 2019 • Zhengkun Tian, Jiangyan Yi, Ye Bai, Jian-Hua Tao, Shuai Zhang, Zhengqi Wen

Once a fixed-length chunk of the input sequence is processed by the encoder, the decoder begins to predict symbols immediately.

Decoder speech-recognition +1

Paper
Add Code

Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data

no code implementations • 4 Dec 2019 • Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang

To alleviate the above two issues, we propose a unified method called LST (Learn Spelling from Teachers) to integrate knowledge into an AED model from the external text-only data and leverage the whole context in a sentence.

Language Modelling Sentence +2

Paper
Add Code

Self-Attention Transducers for End-to-End Speech Recognition

no code implementations • 28 Sep 2019 • Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Zhengqi Wen

Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance.

speech-recognition Speech Recognition

Paper
Add Code

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

no code implementations • 13 Jul 2019 • Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengkun Tian, Zhengqi Wen

Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial.

Knowledge Distillation Language Modelling +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.