Search Results for author: Runxin Xu

Found 24 papers, 15 papers with code

Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation

2 code implementations • ACL 2022 • Liang Chen, Runxin Xu, Baobao Chang

Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models.

Paper
Code

S^4-Tuning: A Simple Cross-lingual Sub-network Tuning Method

no code implementations • ACL 2022 • Runxin Xu, Fuli Luo, Baobao Chang, Songfang Huang, Fei Huang

The emergence of multilingual pre-trained language models makes it possible to adapt to target languages with only few labeled examples. However, vanilla fine-tuning tends to achieve degenerated and unstable results, owing to the Language Interference among different languages, and Parameter Overload under the few-sample transfer learning scenarios. To address two problems elegantly, we propose S^4-Tuning, a Simple Cross-lingual Sub-network Tuning method.

Transfer Learning

Paper
Add Code

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

1 code implementation • 7 May 2024 • DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, JianZhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Liu, Xin Xie, Xingkai Yu, Xinnan Song, Xinyi Zhou, Xinyu Yang, Xuan Lu, Xuecheng Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Zheng, Yichao Zhang, Yiliang Xiong, Yilong Zhao, Ying He, Ying Tang, Yishi Piao, Yixin Dong, Yixuan Tan, Yiyuan Liu, Yongji Wang, Yongqiang Guo, Yuchen Zhu, Yuduan Wang, Yuheng Zou, Yukun Zha, Yunxian Ma, Yuting Yan, Yuxiang You, Yuxuan Liu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhewen Hao, Zhihong Shao, Zhiniu Wen, Zhipeng Xu, Zhongyu Zhang, Zhuoshu Li, Zihan Wang, Zihui Gu, Zilin Li, Ziwei Xie

MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.

Language Modelling Reinforcement Learning (RL)

2,370

Paper
Code

Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

no code implementations • 1 Mar 2024 • Lei LI, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liu

To fill this gap, we introduce Multimodal ArXiv, consisting of ArXivCap and ArXivQA, for enhancing LVLMs scientific comprehension.

Benchmarking Mathematical Reasoning +1

Paper
Add Code

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

1 code implementation • 5 Feb 2024 • Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature.

Ranked #12 on Math Word Problem Solving on MATH (using extra training data)

Arithmetic Reasoning Math +1

639

Paper
Code

Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization

no code implementations • 24 May 2023 • Shoujie Tong, Heming Xia, Damai Dai, Runxin Xu, Tianyu Liu, Binghuai Lin, Yunbo Cao, Zhifang Sui

Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data.

Natural Language Understanding

Paper
Add Code

A Double-Graph Based Framework for Frame Semantic Parsing

1 code implementation • NAACL 2022 • Ce Zheng, Xudong Chen, Runxin Xu, Baobao Chang

In this paper, we propose a Knowledge-guided Incremental semantic parser with Double-graph (KID).

graph construction Semantic Parsing

Paper
Code

A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction

1 code implementation • NAACL 2022 • Runxin Xu, Peiyi Wang, Tianyu Liu, Shuang Zeng, Baobao Chang, Zhifang Sui

In this paper, we focus on extracting event arguments from an entire document, which mainly faces two critical problems: a) the long-distance dependency between trigger and arguments over sentences; b) the distracting context towards an event in the document.

Document-level Event Extraction Event Argument Extraction +2

Paper
Code

ATP: AMRize Then Parse! Enhancing AMR Parsing with PseudoAMRs

2 code implementations • Findings (NAACL) 2022 • Liang Chen, Peiyi Wang, Runxin Xu, Tianyu Liu, Zhifang Sui, Baobao Chang

As Abstract Meaning Representation (AMR) implicitly involves compound semantic annotations, we hypothesize auxiliary tasks which are semantically or formally related can better enhance AMR parsing.

Ranked #7 on AMR Parsing on LDC2020T02 (using extra training data)

AMR Parsing Dependency Parsing +1

Paper
Code

Knowledgeable Salient Span Mask for Enhancing Language Models as Knowledge Base

no code implementations • 17 Apr 2022 • Cunxiang Wang, Fuli Luo, Yanyang Li, Runxin Xu, Fei Huang, Yue Zhang

Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks.

Self-Supervised Learning

Paper
Add Code

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency

no code implementations • ACL 2022 • Yanyang Li, Fuli Luo, Runxin Xu, Songfang Huang, Fei Huang, LiWei Wang

Structured pruning has been extensively studied on monolingual pre-trained language models and is yet to be fully evaluated on their multilingual counterparts.

Paper
Add Code

Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning

1 code implementation • 1 Apr 2022 • Ziyun Xu, Chengyu Wang, Minghui Qiu, Fuli Luo, Runxin Xu, Songfang Huang, Jun Huang

Pre-trained Language Models (PLMs) have achieved remarkable performance for various language understanding tasks in IR systems, which require the fine-tuning process based on labeled training data.

Contrastive Learning

1,966

Paper
Code

Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation

2 code implementations • 6 Mar 2022 • Liang Chen, Runxin Xu, Baobao Chang

Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models.

Machine Translation Translation

Paper
Code

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

2 code implementations • 14 Dec 2021 • Runxin Xu, Fuli Luo, Chengyu Wang, Baobao Chang, Jun Huang, Songfang Huang, Fei Huang

Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge.

Contrastive Learning Language Modelling +2

1,955

Paper
Code

An Enhanced Span-based Decomposition Method for Few-Shot Sequence Labeling

1 code implementation • NAACL 2022 • Peiyi Wang, Runxin Xu, Tianyu Liu, Qingyu Zhou, Yunbo Cao, Baobao Chang, Zhifang Sui

Few-Shot Sequence Labeling (FSSL) is a canonical paradigm for the tagging models, e. g., named entity recognition and slot filling, to generalize on an emerging, resource-scarce domain.

Ranked #6 on Few-shot NER on Few-NERD (INTER)

Few-shot NER Meta-Learning +4

Paper
Code

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

3 code implementations • EMNLP 2021 • Runxin Xu, Fuli Luo, Zhiyuan Zhang, Chuanqi Tan, Baobao Chang, Songfang Huang, Fei Huang

Recent pretrained language models extend from millions to billions of parameters.

Language Modelling Large Language Model

1,955

Paper
Code

Behind the Scenes: An Exploration of Trigger Biases Problem in Few-Shot Event Classification

1 code implementation • 29 Aug 2021 • Peiyi Wang, Runxin Xu, Tianyu Liu, Damai Dai, Baobao Chang, Zhifang Sui

However, we find they suffer from trigger biases that signify the statistical homogeneity between some trigger words and target event types, which we summarize as trigger overlapping and trigger separability.

Paper
Code

Explicit Interaction Network for Aspect Sentiment Triplet Extraction

no code implementations • 21 Jun 2021 • Peiyi Wang, Tianyu Liu, Damai Dai, Runxin Xu, Baobao Chang, Zhifang Sui

Table encoder extracts sentiment at token-pair level, so that the compositional feature between targets and opinions can be easily captured.

Aspect Sentiment Triplet Extraction Sentence +1

Paper
Add Code

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker

2 code implementations • ACL 2021 • Runxin Xu, Tianyu Liu, Lei LI, Baobao Chang

Existing methods are not effective due to two challenges of this task: a) the target event arguments are scattered across sentences; b) the correlation among events in a document is non-trivial to model.

Ranked #2 on Document-level Event Extraction on ChFinAnn

Document-level Event Extraction Event Extraction

225

Paper
Code

Volctrans Parallel Corpus Filtering System for WMT 2020

no code implementations • WMT (EMNLP) 2020 • Runxin Xu, Zhuo Zhi, Jun Cao, Mingxuan Wang, Lei LI

In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions.

Sentence Word Alignment

Paper
Add Code

Double Graph Based Reasoning for Document-level Relation Extraction

2 code implementations • EMNLP 2020 • Shuang Zeng, Runxin Xu, Baobao Chang, Lei LI

Document-level relation extraction aims to extract relations among entities within a document.

Ranked #12 on Relation Extraction on DocRED

Document-level Relation Extraction Relation +1

141

Paper
Code

Xiaomingbot: A Multilingual Robot News Reporter

no code implementations • ACL 2020 • Runxin Xu, Jun Cao, Mingxuan Wang, Jiaze Chen, Hao Zhou, Ying Zeng, Yu-Ping Wang, Li Chen, Xiang Yin, Xijin Zhang, Songcheng Jiang, Yuxuan Wang, Lei LI

This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation.

News Generation Translation +1

Paper
Add Code

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

1 code implementation • 12 Jun 2020 • Xunpeng Huang, Runxin Xu, Hao Zhou, Zhe Wang, Zhengyang Liu, Lei LI

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence.

BIG-bench Machine Learning Stochastic Optimization

Paper
Code

Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs

no code implementations • 12 Jun 2020 • Xunpeng Huang, Hao Zhou, Runxin Xu, Zhe Wang, Lei LI

Adaptive gradient methods have attracted much attention of machine learning communities due to the high efficiency.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.