Identifying main effects and interactions among exposures using Gaussian processes

5 Nov 2019  ·  Federico Ferrari, David B. Dunson ·

This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instead on selection of main effects and interactions. For interpretability, we decompose the expected health outcome into a linear main effect, pairwise interactions, and a non-linear deviation. Our interest is in model selection for these different components, accounting for uncertainty and addressing non-identifability between the linear and nonparametric components of the semiparametric model. We propose a Bayesian approach to inference, placing variable selection priors on the different components, and developing a Markov chain Monte Carlo (MCMC) algorithm. A key component of our approach is the incorporation of a heredity constraint to only include interactions in the presence of main effects, effectively reducing dimensionality of the model search. We adapt a projection approach developed in the spatial statistics literature to enforce identifiability in modeling the nonparametric component using a Gaussian process. We also employ a dimension reduction strategy to sample the non-linear random effects that aids the mixing of the MCMC algorithm. The proposed MixSelect framework is evaluated using a simulation study, and is illustrated using a simulation study and data from the National Health and Nutrition Examination Survey (NHANES). Code is available on GitHub.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper