Search Results for author: Francisco Guzman

Found 14 papers, 3 papers with code

Consistent Human Evaluation of Machine Translation across Language Pairs

no code implementations • AMTA 2022 • Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzman, Mona Diab, Philipp Koehn

Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.

Machine Translation Translation

Paper
Add Code

How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

no code implementations • AMTA 2022 • Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman

Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus.

Machine Translation Translation

Paper
Add Code

Alternative Input Signals Ease Transfer in Multilingual Machine Translation

no code implementations • ACL 2022 • Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman

Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible.

Machine Translation Translation

Paper
Add Code

As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation

1 code implementation • Findings (ACL) 2021 • Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Benjamin I. P. Rubinstein, Trevor Cohn

Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation.

Machine Translation Misinformation +2

Paper
Code

Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

1 code implementation • 12 Jul 2021 • Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Yuqing Tang, Benjamin I. P. Rubinstein, Trevor Cohn

Neural machine translation systems are known to be vulnerable to adversarial test inputs, however, as we show in this paper, these systems are also vulnerable to training attacks.

Data Poisoning Machine Translation +3

Paper
Code

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

2 code implementations • 6 Jun 2021 • Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman, Angela Fan

One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks.

Machine Translation Translation

661

Paper
Code

A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning

no code implementations • 2 Nov 2020 • Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzman, Benjamin I. P. Rubinstein, Trevor Cohn

In this paper, we show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data.

Data Poisoning Machine Translation +2

Paper
Add Code

Pairwise Neural Machine Translation Evaluation

no code implementations • IJCNLP 2015 • Francisco Guzman, Shafiq Joty, Lluis Marquez, Preslav Nakov

We present a novel framework for machine translation evaluation using neural networks in a pairwise setting, where the goal is to select the better translation from a pair of hypotheses, given the reference translation.

Machine Translation Sentence +2

Paper
Add Code

DiscoTK: Using Discourse Structure for Machine Translation Evaluation

no code implementations • WS 2014 • Shafiq Joty, Francisco Guzman, Lluis Marquez, Preslav Nakov

We present novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference.

Machine Translation Translation

Paper
Add Code

CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

no code implementations • EMNLP 2020 • Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzman, Philipp Koehn

We mine sixty-eight snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other.

Paper
Add Code

Egyptian Arabic to English Statistical Machine Translation System for NIST OpenMT'2015

no code implementations • 18 Jun 2016 • Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash

The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech.

Language Modelling Translation +1

Paper
Add Code

The AMARA Corpus: Building Parallel Language Resources for the Educational Domain

no code implementations • LREC 2014 • Ahmed Abdelali, Francisco Guzman, Hassan Sajjad, Stephan Vogel

This paper presents the AMARA corpus of on-line educational content: a new parallel corpus of educational video subtitles, multilingually aligned for 20 languages, i. e. 20 monolingual corpora and 190 parallel corpora.

Machine Translation Translation

Paper
Add Code

Optimizing for Sentence-Level BLEU+1 Yields Short Translations

no code implementations • COLING 2012 • Preslav Nakov, Francisco Guzman, Stephan Vogel

Machine Translation Sentence

Paper
Add Code

Understanding the Performance of Statistical MT Systems: A Linear Regression Framework

no code implementations • COLING 2012 • Francisco Guzman, Stephan Vogel

Machine Translation regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.