no code implementations • 8 Jan 2022 • Luke Kurlandski, Michael Bloodgood
This paper shows the choice of the stop set can have a significant impact on the performance of stopping methods and the impact is different for stability-based methods from that on confidence-based methods.
no code implementations • 20 Jan 2020 • Thomas Orth, Michael Bloodgood
An important capability for improving the utility of stopping methods is to effectively forecast the performance of the text classification models.
no code implementations • 26 Jan 2019 • Garrett Beatty, Ethan Kochis, Michael Bloodgood
A crucial aspect of active learning is determining when to stop labeling data.
no code implementations • 26 Jan 2019 • Michael Altschuler, Michael Bloodgood
During active learning, an effective stopping method allows users to limit the number of annotations, which is cost effective.
no code implementations • 24 Jan 2018 • Michael Bloodgood
This paper investigates and evaluates support vector machine active learning algorithms for use with imbalanced datasets, which commonly arise in many applications such as information extraction applications.
no code implementations • 24 Jan 2018 • Garrett Beatty, Ethan Kochis, Michael Bloodgood
When using active learning, smaller batch sizes are typically more efficient from a learning efficiency perspective.
no code implementations • WS 2017 • Michael Bloodgood, Benjamin Strauss
With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments.
no code implementations • ACL 2017 • Michael Bloodgood, Benjamin Strauss
Global constraints and reranking have not been used in cognates detection research to date.
no code implementations • 20 Feb 2017 • Alan Mishler, Kevin Wonus, Wendy Chambers, Michael Bloodgood
Since the events of the Arab Spring, there has been increased interest in using social media to anticipate social unrest.
no code implementations • 25 Feb 2016 • Michael Bloodgood, Benjamin Strauss
Fixing these errors manually is time-consuming and expensive, especially for large amounts of data.
no code implementations • EACL 2014 • Michael Bloodgood, Benjamin Strauss
Although detailed accounts of the matching algorithms used in commercial systems can't be found in the literature, it is widely believed that edit distance algorithms are used.
no code implementations • WS 2013 • Michael Bloodgood, John Grothendieck
Specifically, if the Kappa agreement between two models exceeds a threshold T (where $T>0$), then the difference in F-measure performance between those models is bounded above by $\frac{4(1-T)}{T}$ in all cases.
no code implementations • WS 2012 • Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme
We explore training an automatic modality tagger.
no code implementations • 5 Feb 2015 • Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Chris Callison-Burch, Nathaniel W. Filardo, Christine Piatko, Lori Levin, Scott Miller
We apply our MN annotation scheme to statistical machine translation using a syntactic framework that supports the inclusion of semantic annotations.
no code implementations • 13 Jan 2015 • Benjamin S. Mericli, Michael Bloodgood
Our method strives to balance the amount of research effort the annotator expends with the utility of the annotations for supporting research on improving automated translation lexicon induction.
no code implementations • 31 Oct 2014 • John E. Miller, Michael Bloodgood, Manabu Torii, K. Vijay-Shanker
Part-of-speech (POS) tagging is a fundamental component for performing natural language tasks such as parsing, information extraction, and question answering.
no code implementations • WS 2012 • Michael Bloodgood, Peng Ye, Paul Rodrigues, David Zajic, David Doermann
We investigate combining methods and show that using random forests is a promising approach.
no code implementations • 29 Oct 2014 • Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood, Peng Ye
Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards.
no code implementations • 28 Oct 2014 • David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues, Michael Bloodgood
We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data.
no code implementations • 21 Oct 2014 • Michael Bloodgood, Chris Callison-Burch
We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources.
no code implementations • 20 Oct 2014 • Michael Bloodgood, Chris Callison-Burch
Building machine translation (MT) test sets is a relatively expensive task.
no code implementations • 17 Oct 2014 • Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Nathaniel W. Filardo, Lori Levin, Christine Piatko
Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme.
no code implementations • 24 Sep 2014 • Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie J. Dorr, Nathaniel W. Filardo, Lori Levin, Scott Miller, Christine Piatko
We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation.
no code implementations • 17 Sep 2014 • Michael Bloodgood, K. Vijay-Shanker
Actively sampled data can have very different characteristics than passively sampled data.
no code implementations • 17 Sep 2014 • Michael Bloodgood, K. Vijay-Shanker
A survey of existing methods for stopping active learning (AL) reveals the needs for methods that are: more widely applicable; more aggressive in saving annotations; and more stable across changing datasets.
no code implementations • 12 Sep 2014 • Michael Bloodgood, K. Vijay-Shanker
There is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks.