no code implementations • EAMT 2022 • Ricardo Rei, Ana C Farinha, José G.C. de Souza, Pedro G. Ramos, André F.T. Martins, Luisa Coheur, Alon Lavie
In recent years, several neural fine-tuned machine translation evaluation metrics such as COMET and BLEURT have been proposed.
no code implementations • EAMT 2022 • Miguel Menezes, Vera Cabarrão, Pedro Mota, None Helena Moniz, Alon Lavie
This paper describes the research developed at Unbabel, a Portuguese Machine-translation start-up, that combines MT with human post-edition and focuses strictly on customer service content.
no code implementations • EAMT 2022 • Madalena Gonçalves, Marianna Buchicchio, Craig Stewart, Helena Moniz, Alon Lavie
This paper illustrates a new evaluation framework developed at Unbabel for measuring the quality of source language text and its effect on both Machine Translation (MT) and Human Post-Edition (PE) performed by non-professional post-editors.
no code implementations • WMT (EMNLP) 2020 • Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie
We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics.
no code implementations • WMT (EMNLP) 2021 • Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar
Contrary to previous years’ editions, this year we acquired our own human ratings based on expert-based human evaluation via Multidimensional Quality Metrics (MQM).
1 code implementation • WMT (EMNLP) 2021 • Ricardo Rei, Ana C Farinha, Chrysoula Zerva, Daan van Stigt, Craig Stewart, Pedro Ramos, Taisiya Glushkova, André F. T. Martins, Alon Lavie
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2021 Metrics Shared Task.
1 code implementation • SIGDIAL (ACL) 2022 • John Mendonca, Alon Lavie, Isabel Trancoso
Despite considerable advances in open-domain neural dialogue systems, their evaluation remains a bottleneck.
1 code implementation • 23 Nov 2023 • John Mendonça, Patrícia Pereira, Miguel Menezes, Vera Cabarrão, Ana C. Farinha, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso
Task-oriented conversational datasets often lack topic variability and linguistic diversity.
1 code implementation • 31 Aug 2023 • John Mendonça, Patrícia Pereira, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso
Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English.
1 code implementation • 31 Aug 2023 • John Mendonça, Alon Lavie, Isabel Trancoso
The main limiting factor in the development of robust multilingual dialogue evaluation metrics is the lack of multilingual data and the limited availability of open sourced multilingual dialogue systems.
1 code implementation • 19 May 2023 • Ricardo Rei, Nuno M. Guerreiro, Marcos Treviso, Luisa Coheur, Alon Lavie, André F. T. Martins
Neural metrics for machine translation evaluation, such as COMET, exhibit significant improvements in their correlation with human judgments, as compared to traditional metrics based on lexical overlap, such as BLEU.
no code implementations • 27 Apr 2023 • Hendrik Kempt, Alon Lavie, Saskia K. Nagel
In answering this limitation, in this paper we argue for limiting chatbots in the range of topics they can chat about according to the normative concept of appropriateness.
1 code implementation • 13 Sep 2022 • Ricardo Rei, Marcos Treviso, Nuno M. Guerreiro, Chrysoula Zerva, Ana C. Farinha, Christine Maroti, José G. C. de Souza, Taisiya Glushkova, Duarte M. Alves, Alon Lavie, Luisa Coheur, André F. T. Martins
We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE).
no code implementations • ACL 2021 • Ricardo Rei, Ana C Farinha, Craig Stewart, Luisa Coheur, Alon Lavie
We present MT-Telescope, a visualization platform designed to facilitate comparative analysis of the output quality of two Machine Translation (MT) systems.
1 code implementation • 29 Oct 2020 • Ricardo Rei, Craig Stewart, Catarina Farinha, Alon Lavie
Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.
1 code implementation • EMNLP 2020 • Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie
We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements.
no code implementations • TACL 2014 • Jonathan H. Clark, Chris Dyer, Alon Lavie
Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective.