1 code implementation • 28 Feb 2024 • Shaoyang Xu, Weilong Dong, Zishan Guo, Xinwei Wu, Deyi Xiong
Drawing from our findings on multilingual value alignment, we prudently provide suggestions on the composition of multilingual data for LLMs pre-training: including a limited number of dominant languages for cross-lingual alignment transfer while avoiding their excessive prevalence, and keeping a balanced distribution of non-dominant languages.
no code implementations • 7 Nov 2023 • Shaoyang Xu, Junzhuo Li, Deyi Xiong
Multilingual pretrained language models serve as repositories of multilingual factual knowledge.