SALT : Sharing Attention between Linear layer and Transformer for tabular dataset

29 Sep 2021 · Juseong Kim, Jinsun Park, Giltae Song ·

Handling tabular data with deep learning models is a challenging problem despite their remarkable success in vision and language processing applications. Therefore, many practitioners still rely on classical models such as gradient boosting decision trees (GBDTs) rather than deep networks due to their superior performance with tabular data. In this paper, we propose a novel hybrid deep network architecture for tabular data, dubbed SALT (Sharing Attention between Linear layer and Transformer). The proposed SALT consists of two blocks: Transformers and linear layers blocks that take advantage of shared attention matrices. The shared attention matrices enable transformers and linear layers to closely cooperate with each other, and it leads to improved performance and robustness. Our algorithm outperforms tree-based ensemble models and previous deep learning methods in multiple benchmark datasets. We further demonstrate the robustness of the proposed SALT with semi-supervised learning and pre-training with small dataset scenarios.

PDF Abstract