2 code implementations • ACL 2022 • Liang Chen, Runxin Xu, Baobao Chang
Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models.
no code implementations • ACL 2022 • Runxin Xu, Fuli Luo, Baobao Chang, Songfang Huang, Fei Huang
The emergence of multilingual pre-trained language models makes it possible to adapt to target languages with only few labeled examples. However, vanilla fine-tuning tends to achieve degenerated and unstable results, owing to the Language Interference among different languages, and Parameter Overload under the few-sample transfer learning scenarios. To address two problems elegantly, we propose S^4-Tuning, a Simple Cross-lingual Sub-network Tuning method.
1 code implementation • 7 May 2024 • DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, JianZhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Liu, Xin Xie, Xingkai Yu, Xinnan Song, Xinyi Zhou, Xinyu Yang, Xuan Lu, Xuecheng Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Zheng, Yichao Zhang, Yiliang Xiong, Yilong Zhao, Ying He, Ying Tang, Yishi Piao, Yixin Dong, Yixuan Tan, Yiyuan Liu, Yongji Wang, Yongqiang Guo, Yuchen Zhu, Yuduan Wang, Yuheng Zou, Yukun Zha, Yunxian Ma, Yuting Yan, Yuxiang You, Yuxuan Liu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhewen Hao, Zhihong Shao, Zhiniu Wen, Zhipeng Xu, Zhongyu Zhang, Zhuoshu Li, Zihan Wang, Zihui Gu, Zilin Li, Ziwei Xie
MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.
no code implementations • 1 Mar 2024 • Lei LI, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liu
To fill this gap, we introduce Multimodal ArXiv, consisting of ArXivCap and ArXivQA, for enhancing LVLMs scientific comprehension.
1 code implementation • 5 Feb 2024 • Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature.
Ranked #12 on Math Word Problem Solving on MATH (using extra training data)
no code implementations • 24 May 2023 • Shoujie Tong, Heming Xia, Damai Dai, Runxin Xu, Tianyu Liu, Binghuai Lin, Yunbo Cao, Zhifang Sui
Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data.
1 code implementation • NAACL 2022 • Ce Zheng, Xudong Chen, Runxin Xu, Baobao Chang
In this paper, we propose a Knowledge-guided Incremental semantic parser with Double-graph (KID).
1 code implementation • NAACL 2022 • Runxin Xu, Peiyi Wang, Tianyu Liu, Shuang Zeng, Baobao Chang, Zhifang Sui
In this paper, we focus on extracting event arguments from an entire document, which mainly faces two critical problems: a) the long-distance dependency between trigger and arguments over sentences; b) the distracting context towards an event in the document.
Document-level Event Extraction Event Argument Extraction +2
2 code implementations • Findings (NAACL) 2022 • Liang Chen, Peiyi Wang, Runxin Xu, Tianyu Liu, Zhifang Sui, Baobao Chang
As Abstract Meaning Representation (AMR) implicitly involves compound semantic annotations, we hypothesize auxiliary tasks which are semantically or formally related can better enhance AMR parsing.
Ranked #7 on AMR Parsing on LDC2020T02 (using extra training data)
no code implementations • 17 Apr 2022 • Cunxiang Wang, Fuli Luo, Yanyang Li, Runxin Xu, Fei Huang, Yue Zhang
Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks.
no code implementations • ACL 2022 • Yanyang Li, Fuli Luo, Runxin Xu, Songfang Huang, Fei Huang, LiWei Wang
Structured pruning has been extensively studied on monolingual pre-trained language models and is yet to be fully evaluated on their multilingual counterparts.
1 code implementation • 1 Apr 2022 • Ziyun Xu, Chengyu Wang, Minghui Qiu, Fuli Luo, Runxin Xu, Songfang Huang, Jun Huang
Pre-trained Language Models (PLMs) have achieved remarkable performance for various language understanding tasks in IR systems, which require the fine-tuning process based on labeled training data.
2 code implementations • 6 Mar 2022 • Liang Chen, Runxin Xu, Baobao Chang
Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models.
2 code implementations • 14 Dec 2021 • Runxin Xu, Fuli Luo, Chengyu Wang, Baobao Chang, Jun Huang, Songfang Huang, Fei Huang
Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge.
1 code implementation • NAACL 2022 • Peiyi Wang, Runxin Xu, Tianyu Liu, Qingyu Zhou, Yunbo Cao, Baobao Chang, Zhifang Sui
Few-Shot Sequence Labeling (FSSL) is a canonical paradigm for the tagging models, e. g., named entity recognition and slot filling, to generalize on an emerging, resource-scarce domain.
Ranked #6 on Few-shot NER on Few-NERD (INTER)
3 code implementations • EMNLP 2021 • Runxin Xu, Fuli Luo, Zhiyuan Zhang, Chuanqi Tan, Baobao Chang, Songfang Huang, Fei Huang
Recent pretrained language models extend from millions to billions of parameters.
1 code implementation • 29 Aug 2021 • Peiyi Wang, Runxin Xu, Tianyu Liu, Damai Dai, Baobao Chang, Zhifang Sui
However, we find they suffer from trigger biases that signify the statistical homogeneity between some trigger words and target event types, which we summarize as trigger overlapping and trigger separability.
no code implementations • 21 Jun 2021 • Peiyi Wang, Tianyu Liu, Damai Dai, Runxin Xu, Baobao Chang, Zhifang Sui
Table encoder extracts sentiment at token-pair level, so that the compositional feature between targets and opinions can be easily captured.
2 code implementations • ACL 2021 • Runxin Xu, Tianyu Liu, Lei LI, Baobao Chang
Existing methods are not effective due to two challenges of this task: a) the target event arguments are scattered across sentences; b) the correlation among events in a document is non-trivial to model.
Ranked #2 on Document-level Event Extraction on ChFinAnn
no code implementations • WMT (EMNLP) 2020 • Runxin Xu, Zhuo Zhi, Jun Cao, Mingxuan Wang, Lei LI
In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions.
2 code implementations • EMNLP 2020 • Shuang Zeng, Runxin Xu, Baobao Chang, Lei LI
Document-level relation extraction aims to extract relations among entities within a document.
Ranked #12 on Relation Extraction on DocRED
no code implementations • ACL 2020 • Runxin Xu, Jun Cao, Mingxuan Wang, Jiaze Chen, Hao Zhou, Ying Zeng, Yu-Ping Wang, Li Chen, Xiang Yin, Xijin Zhang, Songcheng Jiang, Yuxuan Wang, Lei LI
This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation.
1 code implementation • 12 Jun 2020 • Xunpeng Huang, Runxin Xu, Hao Zhou, Zhe Wang, Zhengyang Liu, Lei LI
Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence.
no code implementations • 12 Jun 2020 • Xunpeng Huang, Hao Zhou, Runxin Xu, Zhe Wang, Lei LI
Adaptive gradient methods have attracted much attention of machine learning communities due to the high efficiency.