no code implementations • LREC 2022 • Christopher Cieri, Mark Liberman, Sunghye Cho, Stephanie Strassel, James Fiumara, Jonathan Wright
The Linguistic Data Consortium was founded in 1992 to solve the problem that limitations in access to shareable data was impeding progress in Human Language Technology research and development.
no code implementations • LREC 2022 • Jennifer Tracey, Ann Bies, Jeremy Getman, Kira Griffitt, Stephanie Strassel
This paper describes data resources created for Phase 1 of the DARPA Active Interpretation of Disparate Alternatives (AIDA) program, which aims to develop language technology that can help humans manage large volumes of sometimes conflicting information to develop a comprehensive understanding of events around the world, even when such events are described in multiple media and languages.
no code implementations • LREC 2022 • Michael Arrigo, Stephanie Strassel, Nolan King, Thao Tran, Lisa Mason
CAMIO (Corpus of Annotated Multilingual Images for OCR) is a new corpus created by Linguistic Data Consortium to serve as a resource to support the development and evaluation of optical character recognition (OCR) and related technologies for 35 languages across 24 unique scripts.
Optical Character Recognition Optical Character Recognition (OCR)
no code implementations • LREC 2022 • Karen Jones, Kevin Walker, Christopher Caruso, Jonathan Wright, Stephanie Strassel
The WeCanTalk (WCT) Corpus is a new multi-language, multi-modal resource for speaker recognition.
no code implementations • LREC 2020 • Dana Delgado, Kevin Walker, Stephanie Strassel, Karen Jones, Christopher Caruso, David Graff
We introduce a new resource, the SAFE-T (Speech Analysis for Emergency Response Technology) Corpus, designed to simulate first-responder communications by inducing high vocal effort and urgent speech with situational background noise in a game-based collection protocol.
no code implementations • LREC 2020 • Karen Jones, Stephanie Strassel, Kevin Walker, Jonathan Wright
Speakers used a variety of handsets, including landline and mobile devices, and made VoIP calls from tablets or computers.
no code implementations • LREC 2020 • Justin Mott, Ann Bies, Stephanie Strassel, Jordan Kodner, Caitlin Richter, Hongzhi Xu, Mitchell Marcus
This paper describes a new morphology resource created by Linguistic Data Consortium and the University of Pennsylvania for the DARPA LORELEI Program.
no code implementations • LREC 2020 • Christopher Cieri, James Fiumara, Stephanie Strassel, Jonathan Wright, Denise DiPersio, Mark Liberman
This latest in a series of Linguistic Data Consortium (LDC) progress reports to the LREC community does not describe any single language resource, evaluation campaign or technology but sketches the activities, since the last report, of a data center devoted to supporting the work of LREC attendees among other research communities.
no code implementations • LREC 2020 • Jennifer Tracey, Stephanie Strassel
This paper documents and describes the thirty-one basic language resource packs created for the DARPA LORELEI program for use in development and testing of systems capable of providing language-independent situational awareness in emerging scenarios in a low resource language context.
no code implementations • LREC 2016 • Stephanie Strassel, Jennifer Tracey
In this paper, we describe the textual linguistic resources in nearly 3 dozen languages being produced by Linguistic Data Consortium for DARPA{'}s LORELEI (Low Resource Languages for Emergent Incidents) Program.
no code implementations • LREC 2016 • Xuansong Li, Jennifer Tracey, Stephen Grimes, Stephanie Strassel
Morphologically-rich languages pose problems for machine translation (MT) systems, including word-alignment errors, data sparsity and multiple affixes.
no code implementations • LREC 2016 • Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright
The Multi-language Speech (MLS) Corpus supports NIST{'}s Language Recognition Evaluation series by providing new conversational telephone speech and broadcast narrowband data in 20 languages/dialects.
no code implementations • LREC 2016 • Kira Griffitt, Stephanie Strassel
The DARPA BOLT Information Retrieval evaluations target open-domain natural-language queries over a large corpus of informal text in English, Chinese and Egyptian Arabic.
no code implementations • LREC 2016 • Justin Mott, Ann Bies, Zhiyi Song, Stephanie Strassel
This paper introduces the parallel Chinese-English Entities, Relations and Events (ERE) corpora developed by Linguistic Data Consortium under the DARPA Deep Exploration and Filtering of Text (DEFT) Program.
no code implementations • LREC 2016 • Christopher Cieri, Mike Maxwell, Stephanie Strassel, Jennifer Tracey
This paper documents and describes the criteria used to select languages for study within programs that include low resource languages whether given that label or another similar one.
no code implementations • LREC 2016 • Xuansong Li, Martha Palmer, Nianwen Xue, Lance Ramshaw, Mohamed Maamouri, Ann Bies, Kathryn Conger, Stephen Grimes, Stephanie Strassel
High accuracy for automated translation and information retrieval calls for linguistic annotations at various language levels.
no code implementations • SEMEVAL 2015 • Vinodkumar Prabhakaran, Tomas By, Julia Hirschberg, Owen Rambow, Samira Shaikh, Tomek Strzalkowski, Jennifer Tracey, Michael Arrigo, Rupayan Basu, Micah Clark, Adam Dalton, Mona Diab, Louise Guthrie, Anna Prokofieva, Stephanie Strassel, Gregory Werner, Yorick Wilks, Janyce Wiebe
no code implementations • WS 2014 • Ann Bies, Zhiyi Song, Mohamed Maamouri, Stephen Grimes, Haejoong Lee, Jonathan Wright, Stephanie Strassel, Nizar Habash, Esk, Ramy er, Owen Rambow
no code implementations • LREC 2014 • Christopher Cieri, Denise DiPersio, Mark Liberman, Andrea Mazzucchi, Stephanie Strassel, Jonathan Wright
Despite the growth in the number of linguistic data centers around the world, their accomplishments and expansions and the advances they have help enable, the language resources that exist are a small fraction of those required to meet the goals of Human Language Technologies (HLT) for the worldÂ’s languages and the promises they offer: broad access to knowledge, direct communication across language boundaries and engagement in a global community.
no code implementations • LREC 2014 • David Graff, Kevin Walker, Stephanie Strassel, Xiaoyi Ma, Karen Jones, Ann Sawyer
The DARPA RATS program was established to foster development of language technology systems that can perform well on speaker-to-speaker communications over radio channels that evince a wide range in the type and extent of signal variability and acoustic degradation.
no code implementations • LREC 2014 • Zhiyi Song, Stephanie Strassel, Haejoong Lee, Kevin Walker, Jonathan Wright, Jennifer Garland, Dana Fore, Brian Gainor, Preston Cabe, Thomas Thomas, Brendan Callahan, Ann Sawyer
The DARPA BOLT Program develops systems capable of allowing English speakers to retrieve and understand information from informal foreign language genres.
no code implementations • LREC 2012 • Xuansong Li, Stephanie Strassel, Stephen Grimes, Safa Ismael, Mohamed Maamouri, Ann Bies, Nianwen Xue
Parallel aligned treebanks (PAT) are linguistic corpora annotated with morphological and syntactic structures that are aligned at sentence as well as sub-sentence levels.
no code implementations • LREC 2012 • Xuansong Li, Stephanie Strassel, Heng Ji, Kira Griffitt, Joe Ellis
To advance information extraction and question answering technologies toward a more realistic path, the U. S. NIST (National Institute of Standards and Technology) initiated the KBP (Knowledge Base Population) task as one of the TAC (Text Analysis Conference) evaluation tracks.
no code implementations • LREC 2012 • Stephanie Strassel, Am Morris, a, Jonathan Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, Martial Michel
Linguistic Data Consortium and the National Institute of Standards and Technology are collaborating to create a large, heterogeneous annotated multimodal corpus to support research in multimodal event detection and related technologies.
no code implementations • LREC 2012 • Jonathan Wright, Kira Griffitt, Joe Ellis, Stephanie Strassel, Brendan Callahan
In recent months, LDC has developed a web-based annotation infrastructure centered around a tree model of annotations and a Ruby on Rails application called the LDC User Interface (LUI).
no code implementations • LREC 2012 • Zhiyi Song, Safa Ismael, Stephen Grimes, David Doermann, Stephanie Strassel
LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT.