Search Results for author: Jiangang Luo

Found 3 papers, 3 papers with code

Yuan 2.0-M32: Mixture of Experts with Attention Router

1 code implementation • 28 May 2024 • Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

Yuan 2. 0-M32, with a similar base architecture as Yuan-2. 0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active.

Math

147

Paper
Code

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

2 code implementations • 27 Nov 2023 • Shaohua Wu, Xudong Zhao, Shenling Wang, Jiangang Luo, Lingjun Li, Xi Chen, Bing Zhao, Wei Wang, Tong Yu, Rongguo Zhang, Jiahua Zhang, Chao Wang

In this work, we develop and release Yuan 2. 0, a series of large language models with parameters ranging from 2. 1 billion to 102. 6 billion.

Code Generation Language Modelling +2

875

Paper
Code

Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning

1 code implementation • 10 Oct 2021 • Shaohua Wu, Xudong Zhao, Tong Yu, Rongguo Zhang, Chong Shen, Hongli Liu, Feng Li, Hong Zhu, Jiangang Luo, Liang Xu, Xuanwei Zhang

With this method, Yuan 1. 0, the current largest singleton language model with 245B parameters, achieves excellent performance on thousands GPUs during training, and the state-of-the-art results on NLP tasks.

Few-Shot Learning Language Modelling +1

592

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.