no code implementations • ICML 2020 • Roberta Raileanu, Max Goldstein, Arthur Szlam, Facebook Rob Fergus
An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.
no code implementations • 15 Mar 2024 • Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Adhiguna Kuncoro, Yani Donchev, Rachita Chhaparia, Ionel Gog, Marc'Aurelio Ranzato, Jiajun Shen, Arthur Szlam
Progress in machine learning (ML) has been fueled by scaling neural network models.
1 code implementation • 17 Jan 2024 • Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato
Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.
no code implementations • 14 Nov 2023 • Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen
In this work, we propose a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that enables training of language models on islands of devices that are poorly connected.
1 code implementation • 14 Sep 2023 • Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam
In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent.
2 code implementations • 18 May 2023 • Shrestha Mohanty, Negar Arabzadeh, Julia Kiseleva, Artem Zholus, Milagro Teruel, Ahmed Awadallah, Yuxuan Sun, Kavya Srinet, Arthur Szlam
Human intelligence's adaptability is remarkable, allowing us to adjust to new tasks and multi-modal environments swiftly.
no code implementations • 26 Apr 2023 • Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili
We compare models trained on our new dataset to existing pairwise-trained dialogue models, as well as large language models with few-shot prompting.
no code implementations • 13 Jan 2023 • Alexander Gurung, Mojtaba Komeili, Arthur Szlam, Jason Weston, Jack Urbanek
While language models have become more capable of producing compelling language, we find there are still gaps in maintaining consistency, especially when describing events in a dynamically changing world.
2 code implementations • 12 Nov 2022 • Shrestha Mohanty, Negar Arabzadeh, Milagro Teruel, Yuxuan Sun, Artem Zholus, Alexey Skrynnik, Mikhail Burtsev, Kavya Srinet, Aleksandr Panov, Arthur Szlam, Marc-Alexandre Côté, Julia Kiseleva
Human intelligence can remarkably adapt quickly to new tasks and environments.
2 code implementations • 11 Oct 2022 • Nur Muhammad Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam
We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization.
2 code implementations • 5 Aug 2022 • Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston
We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks.
1 code implementation • 27 May 2022 • Julia Kiseleva, Alexey Skrynnik, Artem Zholus, Shrestha Mohanty, Negar Arabzadeh, Marc-Alexandre Côté, Mohammad Aliannejadi, Milagro Teruel, Ziming Li, Mikhail Burtsev, Maartje ter Hoeve, Zoya Volovikova, Aleksandr Panov, Yuxuan Sun, Kavya Srinet, Arthur Szlam, Ahmed Awadallah
Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions.
no code implementations • 5 May 2022 • Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Marc-Alexandre Côté, Katja Hofmann, Ahmed Awadallah, Linar Abdrazakov, Igor Churin, Putra Manggala, Kata Naszadi, Michiel van der Meer, Taewoon Kim
The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment.
no code implementations • 19 Apr 2022 • Yuxuan Sun, Ethan Carlson, Rebecca Qian, Kavya Srinet, Arthur Szlam
In this work we give a case study of an embodied machine-learning (ML) powered agent that improves itself via interactions with crowd-workers.
1 code implementation • 24 Mar 2022 • Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston
We show that, when using SeeKeR as a dialogue model, it outperforms the state-of-the-art model BlenderBot 2 (Chen et al., 2021) on open-domain knowledge-grounded conversations for the same number of parameters, in terms of consistency, knowledge and per-turn engagingness.
no code implementations • 11 Mar 2022 • Tyler L. Hayes, Maximilian Nickel, Christopher Kanan, Ludovic Denoyer, Arthur Szlam
Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
no code implementations • Findings (NAACL) 2022 • Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston
State-of-the-art dialogue models still often stumble with regards to factual accuracy and self-contradiction.
no code implementations • 9 Nov 2021 • Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston
Large language models can produce fluent dialogue but often hallucinate factual inaccuracies.
no code implementations • 13 Oct 2021 • Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Katja Hofmann, Michel Galley, Ahmed Awadallah
Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions.
no code implementations • ACL 2022 • Jing Xu, Arthur Szlam, Jason Weston
Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context.
no code implementations • NeurIPS 2021 • Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston
We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.
1 code implementation • 13 May 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.
Ranked #4 on Language Modelling on enwik8
1 code implementation • 25 Jan 2021 • Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam
In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale.
1 code implementation • 1 Jan 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.
no code implementations • 30 Dec 2020 • Sabrina J. Mielke, Arthur Szlam, Emily Dinan, Y-Lan Boureau
While improving neural dialogue agents' factual accuracy is the object of much research, another important aspect of communication, less studied in the setting of neural dialogue, is transparency about ignorance.
no code implementations • 17 Dec 2020 • Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam
In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.
1 code implementation • 6 Oct 2020 • Ramakrishna Vedantam, Arthur Szlam, Maximilian Nickel, Ari Morcos, Brenden Lake
Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head").
no code implementations • NAACL 2021 • Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston
We seek to create agents that both act and communicate with other agents in pursuit of a goal.
no code implementations • 18 Aug 2020 • Kurt Shuster, Jack Urbanek, Emily Dinan, Arthur Szlam, Jason Weston
As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (Silver et al., 2013).
1 code implementation • 6 Jul 2020 • Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus
An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.
no code implementations • ACL 2020 • Kavya Srinet, Yacine Jernite, Jonathan Gray, Arthur Szlam
We propose a semantic parsing dataset focused on instruction-driven communication with an agent in the game Minecraft.
no code implementations • 29 Jun 2020 • Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta
This paper formulates hypothesis verification as an RL problem.
no code implementations • 22 Jun 2020 • Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson
We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet.
1 code implementation • ICLR 2020 • Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato
In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.
no code implementations • 10 Apr 2020 • Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski
Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.
no code implementations • 6 Apr 2020 • Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam
Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.
no code implementations • 7 Feb 2020 • Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam
Dialogue research tends to distinguish between chit-chat and goal-oriented tasks.
no code implementations • 20 Nov 2019 • Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston
We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.
no code implementations • 25 Sep 2019 • Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta
In order to train the agents, we exploit the underlying structure in the majority of hypotheses -- they can be formulated as triplets (pre-condition, action sequence, post-condition).
1 code implementation • 22 Jul 2019 • Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston
In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.
3 code implementations • 19 Jul 2019 • Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam
This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions.
no code implementations • 7 Jun 2019 • Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam
Energy-based models (EBMs), a. k. a.
no code implementations • ICLR 2019 • Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam
The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time.
no code implementations • 17 Apr 2019 • Yacine Jernite, Kavya Srinet, Jonathan Gray, Arthur Szlam
We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft.
1 code implementation • IJCNLP 2019 • Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston
We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.
2 code implementations • 31 Jan 2019 • Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W. black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston
We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots.
2 code implementations • 22 Nov 2018 • Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, Rob Fergus
In hierarchical reinforcement learning a major challenge is determining appropriate low-level policies.
Hierarchical Reinforcement Learning reinforcement-learning +1
no code implementations • ACL 2019 • Sean Welleck, Jason Weston, Arthur Szlam, Kyunghyun Cho
Consistency is a long standing issue faced by dialogue models.
no code implementations • 27 Sep 2018 • Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato
In this work, we aim at addressing this problem by introducing a new benchmark evaluation suite, dubbed \textit{GenEval}.
no code implementations • 6 Sep 2018 • David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna
A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones.
no code implementations • 20 Apr 2018 • Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave
It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.
no code implementations • ICML 2018 • Amy Zhang, Adam Lerer, Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam
The tasks that an agent will need to solve often are not known during training.
1 code implementation • ICML 2018 • Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus
We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility.
Multi-agent Reinforcement Learning reinforcement-learning +1
15 code implementations • ACL 2018 • Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston
Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating.
Ranked #5 on Dialogue Generation on Persona-Chat (using extra training data)
no code implementations • ICLR 2018 • Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston
Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment.
6 code implementations • ICML 2018 • Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam
Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images.
1 code implementation • CVPR 2018 • Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou
This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time.
no code implementations • CVPR 2017 • Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam
In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks.
3 code implementations • ICLR 2018 • Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus
When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.
1 code implementation • 15 Feb 2017 • Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache
While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.
no code implementations • 8 Feb 2017 • W. James Murdoch, Arthur Szlam
Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear.
no code implementations • 29 Jan 2017 • Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala
In this work we propose a simple unsupervised approach for next frame prediction in video.
5 code implementations • 12 Dec 2016 • Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann Lecun
The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting.
Ranked #5 on Procedural Text Understanding on ProPara
1 code implementation • NeurIPS 2016 • Thomas Laurent, James Von Brecht, Xavier Bresson, Arthur Szlam
We introduce a theoretical and algorithmic framework for multi-way graph partitioning that relies on a multiplicative cut-based objective.
no code implementations • 24 Nov 2016 • Michael M. Bronstein, Joan Bruna, Yann Lecun, Arthur Szlam, Pierre Vandergheynst
In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques.
9 code implementations • NeurIPS 2016 • Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus
Many tasks in AI require the collaboration of multiple agents.
1 code implementation • 22 Feb 2016 • Mikael Henaff, Arthur Szlam, Yann Lecun
Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research.
7 code implementations • 7 Dec 2015 • Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus
We describe a very simple bag-of-words baseline for visual question answering.
no code implementations • NeurIPS 2015 • Emily L. Denton, Soumith Chintala, Arthur Szlam, Rob Fergus
In this paper we introduce a generative model capable of producing high quality samples of natural images.
2 code implementations • 23 Nov 2015 • Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus
This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning.
1 code implementation • 21 Nov 2015 • Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston
A long-term goal of machine learning is to build intelligent conversational agents.
no code implementations • 26 Jun 2015 • Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba
The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation.
1 code implementation • 18 Jun 2015 • Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus
In this paper we introduce a generative parametric model capable of producing high quality samples of natural images.
44 code implementations • NeurIPS 2015 • Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus
For the former our approach is competitive with Memory Networks, but with less supervision.
Ranked #6 on Question Answering on bAbi
no code implementations • 11 Mar 2015 • Joan Bruna, Soumith Chintala, Yann Lecun, Serkan Piantino, Arthur Szlam, Mark Tygert
Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.
1 code implementation • 20 Dec 2014 • MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra
We propose a strong baseline model for unsupervised feature learning using video data.
no code implementations • 15 Jun 2014 • Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James Von Brecht
In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning.
no code implementations • CVPR 2014 • Bryan Poling, Gilad Lerman, Arthur Szlam
Our approach does not require direct modeling of the structure or the motion of the scene, and runs in real time on a single CPU core.
4 code implementations • 21 Dec 2013 • Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann Lecun
Convolutional Neural Networks are extremely efficient architectures in image and audio recognition tasks, thanks to their ability to exploit the local translational invariance of signal classes over their domain.
no code implementations • 20 Dec 2013 • Yunlong He, Koray Kavukcuoglu, Yun Wang, Arthur Szlam, Yanjun Qi
In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks.
no code implementations • 16 Nov 2013 • Joan Bruna, Arthur Szlam, Yann Lecun
In this work we compute lower Lipschitz bounds of $\ell_p$ pooling operators for $p=1, 2, \infty$ as well as $\ell_p$ pooling operators preceded by half-rectification layers.