Search Results for author: Taiqiang Wu

Found 11 papers, 6 papers with code

Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast

1 code implementation • 23 May 2024 • Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng Li, Deng Cai, Yujiu Yang, Yu Meng

In MoE, each token in the input sequence activates a different subset of experts determined by a routing mechanism.

Computational Efficiency GSM8K +1

Paper
Code

Adapting LLaMA Decoder to Vision Transformer

1 code implementation • 10 Apr 2024 • Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Taiqiang Wu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a causal mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Computational Efficiency Decoder +4

Paper
Code

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

no code implementations • 3 Apr 2024 • Taiqiang Wu, Chaofan Tao, Jiahao Wang, Zhe Zhao, Ngai Wong

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs).

Knowledge Distillation

Paper
Add Code

Recouple Event Field via Probabilistic Bias for Event Extraction

no code implementations • 19 May 2023 • Xingyu Bai, Taiqiang Wu, Han Guo, Zhe Zhao, Xuefeng Yang, Jiayi Li, Weijie Liu, Qi Ju, Weigang Guo, Yujiu Yang

Event Extraction (EE), aiming to identify and classify event triggers and arguments from event mentions, has benefited from pre-trained language models (PLMs).

Event Extraction

Paper
Add Code

Weight-Inherited Distillation for Task-Agnostic BERT Compression

1 code implementation • 16 May 2023 • Taiqiang Wu, Cheng Hou, Shanshan Lao, Jiayi Li, Ngai Wong, Zhe Zhao, Yujiu Yang

Knowledge Distillation (KD) is a predominant approach for BERT compression.

Knowledge Distillation

Paper
Code

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations • 12 Apr 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

3,229

Paper
Code

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

no code implementations • 24 Mar 2023 • Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic.

Knowledge Distillation

Paper
Add Code

SynGen: A Syntactic Plug-and-play Module for Generative Aspect-based Sentiment Analysis

no code implementations • 25 Feb 2023 • Chengze Yu, Taiqiang Wu, Jiayi Li, Xingyu Bai, Yujiu Yang

To the best of our knowledge, we are the first one to introduce syntactic information to generative ABSA frameworks.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Paper
Add Code

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations • CVPR 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Paper
Add Code

TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

3 code implementations • 13 Dec 2022 • Zhe Zhao, Yudong Li, Cheng Hou, Jing Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao, Han Guo, Weigang Guo, Taiqiang Wu, Tao Zhu, Wenhang Shi, Chen Chen, Shan Huang, Sihong Chen, Liqun Liu, Feifei Li, Xiaoshuai Chen, Xingwu Sun, Zhanhui Kang, Xiaoyong Du, Linlin Shen, Kimmo Yan

The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework.

Decoder

3,002

Paper
Code

Modeling Fine-grained Information via Knowledge-aware Hierarchical Graph for Zero-shot Entity Retrieval

1 code implementation • 20 Nov 2022 • Taiqiang Wu, Xingyu Bai, Weigang Guo, Weijie Liu, Siheng Li, Yujiu Yang

We extract the knowledge units from the corresponding context and then construct a mention/entity centralized graph.

Entity Retrieval Graph Attention +4

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.