no code implementations • NAACL (TeachingNLP) 2021 • David Gaddy, Daniel Fried, Nikita Kitaev, Mitchell Stern, Rodolfo Corona, John DeNero, Dan Klein
We present a set of assignments for a graduate-level NLP course.
1 code implementation • ACL 2022 • Nikita Kitaev, Thomas Lu, Dan Klein
We present an incremental syntactic representation that consists of assigning a single discrete label to each word in a sentence, where the label is predicted using strictly incremental processing of a prefix of the sentence, and the sequence of labels for a sentence fully determines a parse tree.
1 code implementation • NeurIPS 2020 • Giannis Daras, Nikita Kitaev, Augustus Odena, Alexandros G. Dimakis
We propose a novel type of balanced clustering algorithm to approximate attention.
1 code implementation • 11 Oct 2020 • Giannis Daras, Nikita Kitaev, Augustus Odena, Alexandros G. Dimakis
We also show that SMYRF can be used interchangeably with dense attention before and after training.
no code implementations • EMNLP 2020 • Steven Cao, Nikita Kitaev, Dan Klein
We propose a method for unsupervised parsing based on the linguistic notion of a constituency test.
no code implementations • ICLR 2020 • Steven Cao, Nikita Kitaev, Dan Klein
We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in analyzing and improving multilingual BERT.
15 code implementations • ICLR 2020 • Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.
Ranked #2 on Question Answering on Quasart-T
1 code implementation • ACL 2019 • Daniel Fried, Nikita Kitaev, Dan Klein
Neural parsers obtain state-of-the-art results on benchmark treebanks for constituency parsing -- but to what degree do they generalize to other domains?
no code implementations • 4 Jun 2019 • William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit
During training, one can feed KERMIT paired data $(x, y)$ to learn the joint distribution $p(x, y)$, and optionally mix in unpaired data $x$ or $y$ to refine the marginals $p(x)$ or $p(y)$.
Ranked #39 on Machine Translation on WMT2014 English-German
2 code implementations • ACL 2020 • Nikita Kitaev, Dan Klein
We present a constituency parsing algorithm that, like a supertagger, works by assigning labels to each word in a sentence.
Ranked #12 on Constituency Parsing on Penn Treebank
4 code implementations • ACL 2019 • Nikita Kitaev, Steven Cao, Dan Klein
We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions.
Ranked #5 on Constituency Parsing on CTB5
4 code implementations • ACL 2018 • Nikita Kitaev, Dan Klein
We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser.
Ranked #8 on Constituency Parsing on CTB5
2 code implementations • ACL 2019 • Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh
The game involves two players: a Teller and a Drawer.
1 code implementation • EMNLP 2017 • Nikita Kitaev, Dan Klein
We present a model for locating regions in space based on natural language descriptions.