Using Psuedolabels for training Sentiment Classifiers makes the model generalize better across datasets

5 Oct 2021 · Natesh Reddy, Muktabh Mayank Srivastava ·

The problem statement addressed in this work is : For a public sentiment classification API, how can we set up a classifier that works well on different types of data, having limited ability to annotate data from across domains. We show that given a large amount of unannotated data from across different domains and pseudolabels on this dataset generated by a classifier trained on a small annotated dataset from one domain, we can train a sentiment classifier that generalizes better across different datasets.

PDF Abstract