no code implementations • 15 Mar 2024 • Frank Nielsen
Uniparametric and biparametric statistical models always have Fisher Hessian metrics, and in general a simple test allows to check whether the Fisher information matrix yields a Hessian metric or not.
no code implementations • 6 Feb 2024 • Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth
Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc.
no code implementations • 20 Dec 2023 • Frank Nielsen
Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning among others.
no code implementations • 22 Nov 2023 • Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities.
no code implementations • 7 Sep 2023 • Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "\`a-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "\`a-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans.
no code implementations • 20 Jul 2023 • Frank Nielsen
We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions.
no code implementations • 10 Mar 2023 • Andrea Pasquale, Daniel Krefl, Stefano Carrazza, Frank Nielsen
The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques.
1 code implementation • 20 Feb 2023 • Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations.
no code implementations • 16 Feb 2023 • Frank Nielsen
We consider experimentally the linear interpolation curves in the ordinary, natural and expectation parameterizations of the normal distributions, and compare these curves with a curve derived from the Calvo and Oller's isometric embedding of the Fisher-Rao $d$-variate normal manifold into the cone of $(d+1)\times (d+1)$ symmetric positive-definite matrices [Journal of multivariate analysis 35. 2 (1990): 223-242].
no code implementations • 15 Sep 2022 • Rob Brekelmans, Frank Nielsen
Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest.
no code implementations • 17 Jun 2022 • Pascal Mattia Esser, Frank Nielsen
A common way to learn and analyze statistical models is to consider operations in the model parameter space.
no code implementations • 22 Mar 2022 • Frank Nielsen, Ke Sun
A key technique of machine learning and computer vision is to embed discrete weighted graphs into continuous spaces for further downstream processing.
no code implementations • 7 Dec 2021 • Pascal Mattia Esser, Frank Nielsen
We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.
no code implementations • 22 Jul 2021 • Gautier Marti, Victor Goubet, Frank Nielsen
We propose a methodology to approximate conditional distributions in the elliptope of correlation matrices based on conditional generative adversarial networks.
no code implementations • 22 Jul 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces.
no code implementations • 13 Jul 2021 • Frank Nielsen
Since the Jeffreys divergence between Gaussian mixture models is not available in closed-form, various techniques with pros and cons have been proposed in the literature to either estimate, approximate, or lower and upper bound this divergence.
1 code implementation • 1 Jul 2021 • Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram Galstyan, Greg Ver Steeg, Frank Wood
Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average.
no code implementations • 19 Feb 2021 • Frank Nielsen
We generalize the Jensen-Shannon divergence by considering a variational definition with respect to a generic mean extending thereby the notion of Sibson's information radius.
Quantization Information Theory Information Theory
no code implementations • 15 Feb 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt
Natural-gradient descent (NGD) on structured parameter spaces (e. g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations.
no code implementations • 29 Jan 2021 • Frank Nielsen, Kazuki Okamura
We prove that the $f$-divergences between univariate Cauchy distributions are all symmetric, and can be expressed as strictly increasing scalar functions of the symmetric chi-squared divergence.
Information Theory Information Theory Statistics Theory Statistics Theory
no code implementations • 11 Jan 2021 • Frank Nielsen
We study information projections with respect to statistical $f$-divergences between any two location-scale families.
Information Theory Information Theory
no code implementations • NeurIPS Workshop DL-IG 2020 • Rob Brekelmans, Frank Nielsen, Alireza Makhzani, Aram Galstyan, Greg Ver Steeg
The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling.
2 code implementations • NeurIPS Workshop DL-IG 2020 • Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen
Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target.
no code implementations • 12 Jun 2020 • Frank Nielsen
We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams.
no code implementations • 5 Mar 2020 • Frank Nielsen, Richard Nock
It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $\alpha$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family.
1 code implementation • 19 Feb 2020 • Gaëtan Hadjeres, Frank Nielsen
Distances between probability distributions that take into account the geometry of their sample space, like the Wasserstein or the Maximum Mean Discrepancy (MMD) distances have received a lot of attention in machine learning as they can, for instance, be used to compare probability distributions with disjoint supports.
no code implementations • 27 Nov 2019 • Ke Sun, Frank Nielsen
This letter introduces an abstract learning problem called the "set embedding": The objective is to map sets into probability distributions so as to lose less information.
no code implementations • 9 Oct 2019 • Frank Nielsen
The dualistic structure of statistical manifolds in information geometry yields eight types of geodesic triangles passing through three given points, the triangle vertices.
no code implementations • 19 Sep 2019 • Frank Nielsen, Gaëtan Hadjeres
We then define the strictly quasiconvex Bregman divergences as the limit case of scaled and skewed quasiconvex Jensen divergences, and report a simple closed-form formula which shows that these divergences are only pseudo-divergences at countably many inflection points of the generators.
no code implementations • 27 May 2019 • Ke Sun, Frank Nielsen
Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces?
no code implementations • 8 Apr 2019 • Frank Nielsen
The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution.
no code implementations • 14 Mar 2019 • Frank Nielsen, Gaëtan Hadjeres
We consider both finite and infinite power chi expansions of $f$-divergences derived from Taylor's expansions of smooth generators, and elaborate on cases where these expansions yield closed-form formula, bounded approximations, or analytic divergence series expressions of $f$-divergences.
1 code implementation • ICLR 2019 • Gaëtan Hadjeres, Frank Nielsen
This paper presents the Variation Network (VarNet), a generative model providing means to manipulate the high-level attributes of a given input.
no code implementations • 9 Jan 2019 • Frank Nielsen
The traditional Minkowski distances are induced by the corresponding Minkowski norms in real-valued vector spaces.
no code implementations • 19 Dec 2018 • Frank Nielsen, Ke Sun
We experimentally evaluate our new family of distances by quantifying the upper bounds of several jointly convex distances between statistical mixtures, and by proposing a novel efficient method to learn Gaussian mixture models (GMMs) by simplifying kernel density estimators with respect to our distance.
no code implementations • 25 Oct 2018 • Erika Gomes-Gonçalves, Henryk Gzyl, Frank Nielsen
Separable Bregman divergences induce Riemannian metric spaces that are isometric to the Euclidean space after monotone embeddings.
no code implementations • 22 Oct 2018 • Frank Nielsen, Richard Nock
Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing.
2 code implementations • ICLR 2019 • Giorgio Patrini, Rianne van den Berg, Patrick Forré, Marcello Carioni, Samarth Bhargav, Max Welling, Tim Genewein, Frank Nielsen
We show that minimizing the p-Wasserstein distance between the generator and the true data distribution is equivalent to the unconstrained min-min optimization of the p-Wasserstein distance between the encoder aggregated posterior and the prior in latent space, plus a reconstruction error.
no code implementations • 17 Aug 2018 • Frank Nielsen
In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences.
no code implementations • 29 Jun 2018 • Frank Nielsen, Ke Sun
The total variation distance is a core statistical distance between probability measures that satisfies the metric axioms, with value always falling in $[0, 1]$.
1 code implementation • 1 Jun 2018 • Frank Nielsen, Ke Sun
We propose a new generic type of stochastic neurons, called $q$-neurons, that considers activation functions based on Jackson's $q$-derivatives with stochastic parameters $q$.
no code implementations • 20 Mar 2018 • Frank Nielsen, Gaëtan Hadjeres
When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters.
no code implementations • 29 Sep 2017 • Frank Nielsen
We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties.
no code implementations • 19 Sep 2017 • Gaëtan Hadjeres, Frank Nielsen
We demonstrate its efficiency on the task of generating melodies satisfying positional constraints in the style of the soprano parts of the J. S.
no code implementations • 3 Sep 2017 • Gaëtan Hadjeres, Frank Nielsen
These musical sequences belong to a given corpus (or style) and it is obvious that a good distance on musical sequences should take this information into account; being able to define a distance ex nihilo which could be applicable to all music styles seems implausible.
Information Retrieval Sound
no code implementations • 2 Aug 2017 • Frank Nielsen, Richard Nock
The information geometry induced by the Bregman generator set to the Shannon negentropy on this space yields a dually flat space called the mixture family manifold.
no code implementations • ICML 2017 • Ke Sun, Frank Nielsen
Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks.
no code implementations • 14 Jul 2017 • Gaëtan Hadjeres, Frank Nielsen, François Pachet
VAEs (Variational AutoEncoders) have proved to be powerful in the context of density modeling and have been used in a variety of contexts for creative purposes.
no code implementations • 10 Apr 2017 • Richard Nock, Frank Nielsen
In Valiant's model of evolution, a class of representations is evolvable iff a polynomial-time process of random mutations guided by selection converges with high probability to a representation as $\epsilon$-close as desired from the optimal one, for any required $\epsilon>0$.
no code implementations • 3 Apr 2017 • Frank Nielsen, Ke Sun
In the Hilbert simplex geometry, the distance is the non-separable Hilbert's metric distance which satisfies the property of information monotonicity with distance level set functions described by polytope boundaries.
no code implementations • 1 Mar 2017 • Gautier Marti, Frank Nielsen, Mikołaj Bińkowski, Philippe Donnat
We review the state of the art of clustering financial time series and the study of their correlations alongside other interaction networks.
no code implementations • 16 Feb 2017 • Frank Nielsen, Richard Nock
Comparative convexity is a generalization of convexity relying on abstract notions of means.
no code implementations • 14 Jan 2017 • Frank Nielsen, Ke Sun, Stéphane Marchand-Maillet
We describe a framework to build distances by measuring the tightness of inequalities, and introduce the notion of proper statistical divergences and improper pseudo-divergences.
no code implementations • 9 Dec 2016 • Frank Nielsen, Richard Nock
We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series.
5 code implementations • ICML 2017 • Gaëtan Hadjeres, François Pachet, Frank Nielsen
This paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces.
1 code implementation • 30 Oct 2016 • Gautier Marti, Sebastien Andler, Frank Nielsen, Philippe Donnat
We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset.
no code implementations • 22 Sep 2016 • Frank Nielsen, Boris Muzellec, Richard Nock
We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework.
1 code implementation • 15 Sep 2016 • Boris Muzellec, Richard Nock, Giorgio Patrini, Frank Nielsen
We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences.
no code implementations • 20 Jun 2016 • Ke Sun, Frank Nielsen
Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks.
no code implementations • 19 Jun 2016 • Frank Nielsen, Ke Sun
Information-theoretic measures such as the entropy, cross-entropy and the Kullback-Leibler divergence between two mixture models is a core primitive in many signal processing tasks.
no code implementations • 28 Apr 2016 • Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat
This clustering methodology leverages copulas which are distributions encoding the dependence structure between several random variables.
no code implementations • 6 Apr 2016 • Frank Nielsen, Richard Nock
Matrix data sets are common nowadays like in biomedical imaging where the Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) modality produces data sets of 3D symmetric positive definite matrices anchored at voxel positions capturing the anisotropic diffusion properties of water molecules in biological tissues.
no code implementations • 14 Mar 2016 • Junlin Yao, Frank Nielsen
State-of-the-art methods via subspace clustering seek to solve the problem in two steps: First, an affinity matrix is built from data, with appearance features or motion patterns.
no code implementations • 13 Mar 2016 • Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat
Researchers have used from 30 days to several years of daily returns as source data for clustering financial time series based on their correlations.
no code implementations • 8 Feb 2016 • Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni
We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss.
no code implementations • 3 Feb 2016 • Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen
For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds --- state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties.
no code implementations • 3 Feb 2016 • Frank Nielsen
But more precisely, what do we mean by information in images?
no code implementations • 27 Sep 2015 • Gautier Marti, Frank Nielsen, Philippe Donnat
This paper presents a new methodology for clustering multivariate time series leveraging optimal transport between copulas.
no code implementations • 23 Jun 2014 • Frank Nielsen, Richard Nock
This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum.
no code implementations • 11 Mar 2014 • Frank Nielsen, Richard Nock
We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals.
no code implementations • 20 Jan 2014 • Frank Nielsen
When no cost incurs for correct classification and unit cost is charged for misclassification, Bayes' test reduces to the maximum a posteriori decision rule, and Bayes risk simplifies to Bayes' error, the probability of error.
no code implementations • 29 Mar 2013 • Frank Nielsen
Clustering histograms can be performed using the celebrated $k$-means centroid-based algorithm.
no code implementations • NeurIPS 2008 • Richard Nock, Frank Nielsen
Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates.