no code implementations • NAACL (TeachingNLP) 2021 • David Gaddy, Daniel Fried, Nikita Kitaev, Mitchell Stern, Rodolfo Corona, John DeNero, Dan Klein
We present a set of assignments for a graduate-level NLP course.
no code implementations • EMNLP (nlpbt) 2020 • Elman Mansimov, Mitchell Stern, Mia Chen, Orhan Firat, Jakob Uszkoreit, Puneet Jain
In this paper, we offer a preliminary investigation into the task of in-image machine translation: transforming an image containing text in one language into an image containing the same text in another language.
1 code implementation • ACL 2020 • Ruiqi Zhong, Mitchell Stern, Dan Klein
We propose a method for program generation based on semantic scaffolds, lightweight structures representing the high-level semantic and syntactic composition of a program.
1 code implementation • EMNLP 2020 • Eric Wallace, Mitchell Stern, Dawn Song
To mitigate these vulnerabilities, we propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models.
no code implementations • 15 Jan 2020 • Laura Ruis, Mitchell Stern, Julia Proskurnia, William Chan
We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation.
no code implementations • EMNLP 2020 • William Chan, Mitchell Stern, Jamie Kiros, Jakob Uszkoreit
In this work, we present an empirical study of generation order for machine translation.
no code implementations • 4 Jun 2019 • William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit
During training, one can feed KERMIT paired data $(x, y)$ to learn the joint distribution $p(x, y)$, and optionally mix in unpaired data $x$ or $y$ to refine the marginals $p(x)$ or $p(y)$.
Ranked #39 on Machine Translation on WMT2014 English-German
no code implementations • 8 Feb 2019 • Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit
We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations.
no code implementations • NeurIPS 2018 • Mitchell Stern, Noam Shazeer, Jakob Uszkoreit
Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years.
1 code implementation • NAACL 2018 • David Gaddy, Mitchell Stern, Dan Klein
A number of differences have emerged between modern and classic approaches to constituency parsing in recent years, with structural components like grammars and feature-rich lexicons becoming less central while recurrent neural network representations rise in popularity.
4 code implementations • ICML 2018 • Noam Shazeer, Mitchell Stern
In several recently proposed stochastic optimization methods (e. g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients.
no code implementations • NeurIPS 2018 • Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael. I. Jordan
This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006].
no code implementations • EMNLP 2017 • Mitchell Stern, Daniel Fried, Dan Klein
Generative neural models have recently achieved state-of-the-art results for constituency parsing.
no code implementations • ACL 2017 • Daniel Fried, Mitchell Stern, Dan Klein
Recent work has proposed several generative neural models for constituency parsing that achieve state-of-the-art results.
Ranked #14 on Constituency Parsing on Penn Treebank
1 code implementation • NeurIPS 2017 • Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael. I. Jordan
We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response.
3 code implementations • NeurIPS 2017 • Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht
Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks.
no code implementations • ACL 2017 • Mitchell Stern, Jacob Andreas, Dan Klein
In this work, we present a minimal neural model for constituency parsing based on independent scoring of labels and spans.
1 code implementation • ACL 2017 • Maxim Rabinovich, Mitchell Stern, Dan Klein
Tasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs.
Ranked #3 on Semantic Parsing on ATIS