no code implementations • CCL 2021 • Wei Pan, Tianyuan Liu, Yuqing Sun, Bin Gong, Yongman Zhang, Ping Yang
“新词的不断涌现是语言的自然规律, 如在专业领域中新概念和实体名称代表了专业领域中某些共同特征集合的抽象概括, 经常作为关键词在句子中承担一定的角色。新词发现问题直接影响中文分词结果和后继文本语义理解任务的性能, 是自然语言处理研究领域的重要任务。本文提出了融合自编码器和对抗训练的中文新词发现模型, 采用字符级别的自编码器和无监督自学习的方式进行预训练, 可以有效提取语义信息, 不受分词结果影响, 适用于不同领域的文本;同时为了引入通用语言学知识, 添加了先验句法分析结果, 借助领域共享编码器融合语义和语法信息, 以提升划分歧义词的准确性;采用对抗训练机制, 以提取领域无关特征, 减少对于人工标注语料的依赖。实验选择六个不同的专业领域数据集评估新词发现任务, 结果显示本文模型优于其他现有方法;结合模型析构实验, 详细验证了各个模块的有效性。同时通过选择不同类型的源域数据和不同数量的目标域数据进行对比实验, 验证了模型的鲁棒性。最后以可视化的方式对比了自编码器和共享编码器对不同领域数据的编码结果, 显示了对抗训练方法能够有效地提取两者之间的相关性和差异性信息。”
no code implementations • 10 Apr 2024 • Ligen Shi, Chang Liu, Ping Yang, Jun Qiu, Xing Zhao
In spectral CT reconstruction, the basis materials decomposition involves solving a large-scale nonlinear system of integral equations, which is highly ill-posed mathematically.
1 code implementation • 13 Mar 2024 • Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen
Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities.
no code implementations • 6 Nov 2023 • Ruyi Gan, Ziwei Wu, Renliang Sun, Junyu Lu, XiaoJun Wu, Dixiang Zhang, Kunhao Pan, Junqing He, Yuanhe Tian, Ping Yang, Qi Yang, Hao Wang, Jiaxing Zhang, Yan Song
Although many such issues are addressed along the line of research on LLMs, an important yet practical limitation is that many studies overly pursue enlarging model sizes without comprehensively analyzing and optimizing the use of pre-training data in their learning process, as well as appropriate organization and leveraging of such data in training LLMs under cost-effective settings.
no code implementations • 4 Sep 2023 • Chao Peng, Zhengwei Lv, Jiarong Fu, Jiayuan Liang, Zhao Zhang, Ajitha Rajan, Ping Yang
We find that Hawkeye is able to generate GUI event sequences targeting changed functions more reliably than FastBot2 and ARES for the open source Apps and the large commercial App.
1 code implementation • 30 May 2023 • Xiaogang Peng, Hao Wen, Yikai Luo, Xiao Zhou, Keyang Yu, Ping Yang, Zizhao Wu
To overcome this, we propose HyperVD, a novel framework that learns snippet embeddings in hyperbolic space to improve model discrimination.
no code implementations • 17 May 2023 • Ping Yang, Junyu Lu, Ruyi Gan, Junjie Wang, Yuxiang Zhang, Jiaxing Zhang, Pingjian Zhang
We propose a new paradigm for universal information extraction (IE) that is compatible with any schema format and applicable to a list of IE tasks, such as named entity recognition, relation extraction, event extraction and sentiment analysis.
1 code implementation • 16 Oct 2022 • Ping Yang, Junjie Wang, Ruyi Gan, Xinyu Zhu, Lin Zhang, Ziwei Wu, Xinyu Gao, Jiaxing Zhang, Tetsuya Sakai
We propose a new paradigm for zero-shot learners that is format agnostic, i. e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis.
1 code implementation • 7 Sep 2022 • Jiaxing Zhang, Ruyi Gan, Junjie Wang, Yuxiang Zhang, Lin Zhang, Ping Yang, Xinyu Gao, Ziwei Wu, Xiaoqun Dong, Junqing He, Jianheng Zhuo, Qi Yang, Yongfeng Huang, Xiayu Li, Yanghan Wu, Junyu Lu, Xinyu Zhu, Weifeng Chen, Ting Han, Kunhao Pan, Rui Wang, Hao Wang, XiaoJun Wu, Zhongshen Zeng, Chongpei Chen
We hope that this project will be the foundation of Chinese cognitive intelligence.
1 code implementation • 5 Aug 2022 • Junjie Wang, Yuxiang Zhang, Ping Yang, Ruyi Gan
This report describes a pre-trained language model Erlangshen with propensity-corrected loss, the No. 1 in CLUE Semantic Matching Challenge.
no code implementations • 24 Jun 2022 • Junyu Lu, Ping Yang, Ruyi Gan, Jing Yang, Jiaxing Zhang
Even as pre-trained language models share a semantic encoder, natural language understanding suffers from a diversity of output schemas.
1 code implementation • 7 Mar 2022 • Dingkun Long, Qiong Gao, Kuan Zou, Guangwei Xu, Pengjun Xie, Ruijie Guo, Jian Xu, Guanjun Jiang, Luxi Xing, Ping Yang
We find that the performance of retrieval models trained on dataset from general domain will inevitably decrease on specific domain.
no code implementations • 5 Mar 2022 • Yawei Li, Xin Wu, Ping Yang, Guoqian Jiang, Yuan Luo
The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer.
no code implementations • 2 Apr 2020 • Yanwei Zhao, Ping Yang, Qiu Guan, Jianwei Zheng, Wanliang Wang
By taking both advantages of image domain and transform domain in a general framework, we propose a sparsity transform learning and weighted singular values minimization method (STLWSM) for IDN problems.