no code implementations • 27 May 2020 • Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chan
One NAT model, mask-predict, has been applied to ASR but the model needs some heuristics or additional component to estimate the length of the output token sequence.
Audio and Speech Processing Sound