no code implementations • 21 Mar 2024 • Changtong Zan, Liang Ding, Li Shen, Yibing Zhen, Weifeng Liu, DaCheng Tao
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
1 code implementation • 28 Sep 2023 • Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, DaCheng Tao
Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data.
1 code implementation • 5 Jun 2023 • Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, DaCheng Tao
Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios.
1 code implementation • 20 Apr 2023 • Chiaming Hsu, Changtong Zan, Liang Ding, Longyue Wang, Xiaoting Wang, Weifeng Liu, Fu Lin, Wenbin Hu
Experiments on WMT17-EnZh XRE also show the effectiveness of our Prompt-XRE against other competitive baselines.
1 code implementation • 20 Sep 2022 • Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao
As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.
Ranked #1 on Machine Translation on WMT 2022 English-Russian
1 code implementation • COLING 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao
Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT).
1 code implementation • 16 Apr 2022 • Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, DaCheng Tao
For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e. g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e. g. 25 languages from CommonCrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e. g. English-German, making there exists the data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretraining and finetuning stages.