Telemedicine as a special case of Machine Translation

15 Oct 2015 · Krzysztof Wołk, Krzysztof Marasek, Wojciech Glinkowski ·

Machine translation is evolving quite rapidly in terms of quality. Nowadays, we have several machine translation systems available in the web, which provide reasonable translations. However, these systems are not perfect, and their quality may decrease in some specific domains. This paper examines the effects of different training methods when it comes to Polish - English Statistical Machine Translation system used for the medical data. Numerous elements of the EMEA parallel text corpora and not related OPUS Open Subtitles project were used as the ground for creation of phrase tables and different language models including the development, tuning and testing of these translation systems. The BLEU, NIST, METEOR, and TER metrics have been used in order to evaluate the results of various systems. Our experiments deal with the systems that include POS tagging, factored phrase models, hierarchical models, syntactic taggers, and other alignment methods. We also executed a deep analysis of Polish data as preparatory work before automatized data processing such as true casing or punctuation normalization phase. Normalized metrics was used to compare results. Scores lower than 15% mean that Machine Translation engine is unable to provide satisfying quality, scores greater than 30% mean that translations should be understandable without problems and scores over 50 reflect adequate translations. The average results of Polish to English translations scores for BLEU, NIST, METEOR, and TER were relatively high and ranged from 70,58 to 82,72. The lowest score was 64,38. The average results ranges for English to Polish translations were little lower (67,58 - 78,97). The real-life implementations of presented high quality Machine Translation Systems are anticipated in general medical practice and telemedicine.

PDF Abstract