no code implementations • 21 May 2023 • ZiYi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities.
no code implementations • 3 May 2022 • ZiYi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview.
no code implementations • 10 Dec 2021 • Kenichi Kumatani, Dimitrios Dimitriadis, Yashesh Gaur, Robert Gmyr, Sefik Emre Eskimez, Jinyu Li, Michael Zeng
For untranscribed speech data, the hypothesis from an ASR system must be used as a label.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 10 Dec 2021 • Kenichi Kumatani, Robert Gmyr, Felipe Cruz Salinas, Linquan Liu, Wei Zuo, Devang Patel, Eric Sun, Yu Shi
The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 14 Jun 2021 • Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez
The proposed scheme is based on a weighted gradient aggregation using two-step optimization to offer a flexible training pipeline.
no code implementations • 6 Aug 2020 • Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez
The target scenario is Acoustic Model training based on this platform.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, Eric Darve
Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
no code implementations • 25 Dec 2019 • Chenguang Zhu, Ziyi Yang, Robert Gmyr, Michael Zeng, Xuedong Huang
A typical journalistic convention in news articles is to deliver the most salient information in the beginning, also known as the lead bias.
no code implementations • 25 Sep 2019 • Chenguang Zhu, ZiYi Yang, Robert Gmyr, Michael Zeng, Xuedong Huang
For example, the pretrained model without finetuning outperforms pointer-generator network on CNN/DailyMail dataset.