3 code implementations • 8 Jan 2024 • Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed
In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.
Ranked #9 on Question Answering on PIQA
5 code implementations • 10 Oct 2023 • Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed
We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.
Ranked #4 on Zero-Shot Video Question Answer on NExT-GQA
44 code implementations • arXiv 2023 • Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
Ranked #3 on Question Answering on OBQA
no code implementations • 22 Feb 2023 • Pierre-Alexandre Kamienny, Guillaume Lample, Sylvain Lamprier, Marco Virgolin
Symbolic regression (SR) is the problem of learning a symbolic expression from numerical data.
3 code implementations • 21 Oct 2022 • Albert Q. Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample
In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems.
Ranked #3 on Automated Theorem Proving on miniF2F-valid (Pass@100 metric)
no code implementations • 23 May 2022 • Guillaume Lample, Marie-Anne Lachaux, Thibaut Lavril, Xavier Martinet, Amaury Hayat, Gabriel Ebner, Aurélien Rodriguez, Timothée Lacroix
With a similar computational budget, we improve the state of the art on the Lean-based miniF2F-curriculum dataset from 31% to 42% proving accuracy.
Ranked #1 on Automated Theorem Proving on Metamath set.mm (Pass@32 metric)
3 code implementations • 22 Apr 2022 • Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, François Charton
Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the "skeleton" of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function.
no code implementations • 12 Jan 2022 • Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton
Symbolic regression, i. e. predicting a function from the observation of its values, is well-known to be a challenging task.
1 code implementation • ICLR 2022 • Baptiste Roziere, Jie M. Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, Guillaume Lample
With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation.
2 code implementations • NeurIPS 2021 • Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, Guillaume Lample
Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Marie-Anne Lachaux, Armand Joulin, Guillaume Lample
In this paper, we propose to explicitly model this one-to-many mapping by conditioning the decoder of a NMT model on a latent variable that represents the domain of target sentences.
1 code implementation • ICLR 2021 • François Charton, Amaury Hayat, Guillaume Lample
Using transformers over large generated datasets, we train models to learn mathematical properties of differential systems, such as local stability, behavior at infinity and controllability.
9 code implementations • NeurIPS 2020 • Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample
We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy.
7 code implementations • ICLR 2020 • Guillaume Lample, François Charton
Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data.
1 code implementation • IJCNLP 2019 • Francisco Guzm{\'a}n, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc{'}Aurelio Ranzato
For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.
8 code implementations • NeurIPS 2019 • Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture.
2 code implementations • 2 Jul 2019 • Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin
More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.
Ranked #5 on Language Modelling on Text8
no code implementations • ICLR 2019 • Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau
The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".
2 code implementations • 4 Feb 2019 • Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato
For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.
16 code implementations • NeurIPS 2019 • Guillaume Lample, Alexis Conneau
On unsupervised machine translation, we obtain 34. 3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU.
3 code implementations • 1 Nov 2018 • Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau
The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".
no code implementations • EMNLP 2018 • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc{'}Aurelio Ranzato
Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.
10 code implementations • EMNLP 2018 • Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, Veselin Stoyanov
State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models.
Ranked #5 on Natural Language Inference on XNLI French
Cross-Lingual Natural Language Inference Machine Translation +2
no code implementations • ACL 2018 • Alexis Conneau, German Kruszewski, Guillaume Lample, Lo{\"\i}c Barrault, Marco Baroni
Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing.
6 code implementations • 3 May 2018 • Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni
Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing.
15 code implementations • EMNLP 2018 • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato
Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.
Ranked #2 on Machine Translation on WMT2016 English-Russian
no code implementations • NeurIPS 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.
15 code implementations • ICLR 2018 • Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato
By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data.
Ranked #7 on Machine Translation on WMT2016 German-English
19 code implementations • ICLR 2018 • Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation.
Ranked #2 on Word Alignment on en-es
3 code implementations • 1 Jun 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.
8 code implementations • 18 Sep 2016 • Guillaume Lample, Devendra Singh Chaplot
Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions.
no code implementations • NAACL 2016 • Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer
We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.
43 code implementations • NAACL 2016 • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer
State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available.
Ranked #8 on Named Entity Recognition (NER) on CoNLL++
1 code implementation • 5 Feb 2016 • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith
We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space.