Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification

16 Feb 2018  ·  Hector Zenil, Narsis A. Kiani, Alyssa Adams, Felipe S. Abrahão, Antonio Rueda-Toicen, Allan A. Zea, Jesper Tegnér ·

We introduce a family of unsupervised, domain-free, and (asymptotically) model-independent algorithms based on the principles of algorithmic probability and information theory designed to minimize the loss of algorithmic information, including a lossless-compression-based lossy compression algorithm. The methods can select and coarse-grain data in an algorithmic-complexity fashion (without the use of popular compression algorithms) by collapsing regions that may procedurally be regenerated from a computable candidate model. We show that the method can preserve the salient properties of objects and perform dimension reduction, denoising, feature selection, and network sparsification. As validation case, we demonstrate the methods on image segmentation against popular methods like PCA and random selection, and also demonstrate that the method preserves the graph-theoretic indices measured on a well-known set of synthetic and real-world networks of very different nature, ranging from degree distribution and clustering coefficient to edge betweenness and degree and eigenvector centralities, achieving equal or significantly better results than other data reduction and the leading network sparsification methods (Spectral, Transitive).

PDF Abstract

Categories


Data Structures and Algorithms Information Theory Information Theory Physics and Society

Datasets


  Add Datasets introduced or used in this paper