no code implementations • AMTA 2022 • Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzman, Mona Diab, Philipp Koehn
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.
no code implementations • AMTA 2022 • Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman
Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus.
no code implementations • ACL 2022 • Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman
Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible.
1 code implementation • Findings (ACL) 2021 • Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Benjamin I. P. Rubinstein, Trevor Cohn
Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation.
1 code implementation • 12 Jul 2021 • Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Yuqing Tang, Benjamin I. P. Rubinstein, Trevor Cohn
Neural machine translation systems are known to be vulnerable to adversarial test inputs, however, as we show in this paper, these systems are also vulnerable to training attacks.
2 code implementations • 6 Jun 2021 • Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman, Angela Fan
One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks.
no code implementations • 2 Nov 2020 • Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzman, Benjamin I. P. Rubinstein, Trevor Cohn
In this paper, we show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data.
no code implementations • IJCNLP 2015 • Francisco Guzman, Shafiq Joty, Lluis Marquez, Preslav Nakov
We present a novel framework for machine translation evaluation using neural networks in a pairwise setting, where the goal is to select the better translation from a pair of hypotheses, given the reference translation.
no code implementations • WS 2014 • Shafiq Joty, Francisco Guzman, Lluis Marquez, Preslav Nakov
We present novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference.
no code implementations • EMNLP 2020 • Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzman, Philipp Koehn
We mine sixty-eight snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other.
no code implementations • 18 Jun 2016 • Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash
The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech.
no code implementations • LREC 2014 • Ahmed Abdelali, Francisco Guzman, Hassan Sajjad, Stephan Vogel
This paper presents the AMARA corpus of on-line educational content: a new parallel corpus of educational video subtitles, multilingually aligned for 20 languages, i. e. 20 monolingual corpora and 190 parallel corpora.