Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

31 Jul 2023 · Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi ·

On tabular data, a significant body of literature has shown that current deep learning (DL) models perform at best similarly to Gradient Boosted Decision Trees (GBDTs), while significantly underperforming them on outlier data. We identify a natural tabular data setting where DL models can outperform GBDTs: tabular Learning-to-Rank (LTR) under label scarcity. Tabular LTR applications, including search and recommendation, often have an abundance of unlabeled data, and scarce labeled data. We show that DL rankers can utilize unsupervised pretraining to exploit this unlabeled data. In extensive experiments over both public and proprietary datasets, we show that pretrained DL rankers consistently outperform GBDT rankers on ranking metrics -- sometimes by as much as $38\%$ -- both overall and on outliers.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Learning-To-Rank

Datasets

MSLR-WEB30K

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

1x1 Convolution • Average Pooling • BASE • Batch Normalization • Bottleneck Residual Block • ColorJitter • Convolution • Dense Connections • Feedforward Network • Global Average Pooling • Kaiming Initialization • Max Pooling • NT-Xent • Random Gaussian Blur • Random Resized Crop • ReLU • Residual Block • Residual Connection • ResNet • SimCLR

Edit Social Preview

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove