Exchangeable Variational Autoencoders with Applications to Genomic Data

pproximateinference AABI Symposium 2019 · Jeffrey Chan, Jeffrey Spence, and Yun Song ·

Exchangeable-structured datapoints are ubiquitous in statistical problems ranging from point clouds to graphs to sets. Particularly in biological settings where multiple experiments derived from a noisy scientific process attempt to measure a latent variable of interest, experimental datapoints are often exchangeable demanding the development of methods which can exploit this structure. Modern machine learning approaches to scalable Bayesian inference typically use autoencoding variational Bayes -- marrying ideas from deep learning and probabilistic modeling to achieve practical inference for expressive models. Current VAE-based approaches do not naturally handle exchangeable (but non-iid) datapoints. Often exchangeable-structured datapoints may contain heterogeneity in datapoint dimensions precluding a staightforward application of the vanilla VAE framework. In this work, we develop the Exchangeable Variational Autoencoder which provides inferential and computational benefits while enabling varying set size data to be robustly handled in the VAE framework. We then demonstrate its efficacy in two settings: (1) on the well-studied Latent Dirichlet Allocation model and (2) on the bootstrapped, isoform-level uncertainty estimates of single-cell RNA-seq data.

PDF Abstract