On the Discovery of Feature Importance Distribution: An Overlooked Area

1 Jan 2021 · Yuxiao Huang ·

Detecting feature's predictive power is a key problem in Machine Learning. Previous methods have been focusing on providing a single value, usually named feature importance, as a point estimate of the power. However, it is difficult to interpret the predictive power using feature importance. Moreover, in reality feature's predictive power may vary dramatically across feature values. Feature importance, as a point estimate, cannot capture such variance. To address the two problems, we first propose a new definition of feature importance to directly measure feature's predictive power. We then propose a feature importance model to capture a high-resolution distribution of feature importance across feature values. Last we propose a binarized logistic regression model and its learning algorithm to train the feature importance models jointly. We theoretically proved that our approach has the same time complexity as Logistic Regression. Empirical results on three real-world biomedical datasets show that, our approach can detect meaningful feature importance distributions, which could have profound sociological implications. Code, data and full results are publicly available in paper github repository. All the results are reproducible by simply using one command.

PDF Abstract