Search Results for author: Sarthak Mittal

Found 13 papers, 10 papers with code

Does learning the right latent variables necessarily improve in-context learning?

1 code implementation29 May 2024 Sarthak Mittal, Eric Elmoznino, Leo Gagnon, Sangnie Bhardwaj, Dhanya Sridhar, Guillaume Lajoie

Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

1 code implementation9 Feb 2024 Tara Akhound-Sadegh, Jarrid Rector-Brooks, Avishek Joey Bose, Sarthak Mittal, Pablo Lemos, Cheng-Hao Liu, Marcin Sendera, Siamak Ravanbakhsh, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Alexander Tong

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science.

Denoising Efficient Exploration

Improved off-policy training of diffusion samplers

1 code implementation7 Feb 2024 Marcin Sendera, Minsu Kim, Sarthak Mittal, Pablo Lemos, Luca Scimeca, Jarrid Rector-Brooks, Alexandre Adam, Yoshua Bengio, Nikolay Malkin

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function.

Benchmarking

Leveraging Synthetic Targets for Machine Translation

no code implementations7 May 2023 Sarthak Mittal, Oleksii Hrinchuk, Oleksii Kuchaiev

In this work, we provide a recipe for training machine translation models in a limited resource setting by leveraging synthetic target data generated using a large pre-trained model.

Machine Translation Translation

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

1 code implementation27 Dec 2022 Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels.

Data Augmentation

From Points to Functions: Infinite-dimensional Representations in Diffusion Models

1 code implementation25 Oct 2022 Sarthak Mittal, Guillaume Lajoie, Stefan Bauer, Arash Mehrjou

Consequently, it is reasonable to ask if there is an intermediate time step at which the preserved information is optimal for a given downstream task.

Decoder

On Neural Architecture Inductive Biases for Relational Tasks

1 code implementation9 Jun 2022 Giancarlo Kerg, Sarthak Mittal, David Rolnick, Yoshua Bengio, Blake Richards, Guillaume Lajoie

Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems.

Inductive Bias Out-of-Distribution Generalization

Is a Modular Architecture Enough?

1 code implementation6 Jun 2022 Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures.

Out-of-Distribution Generalization

Compositional Attention: Disentangling Search and Retrieval

3 code implementations ICLR 2022 Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed.

Retrieval

Diffusion-Based Representation Learning

no code implementations29 May 2021 Korbinian Abstreiter, Sarthak Mittal, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou

In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective and thus encodes the information needed for denoising.

Denoising Representation Learning +1

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

1 code implementation ICML 2020 Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow.

Language Modelling Open-Ended Question Answering +2

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

no code implementations19 Oct 2018 Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve.

Cannot find the paper you are looking for? You can Submit a new open access paper.