Search Results for author: Cristian García-Romero

Found 3 papers, 0 papers with code

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

no code implementations EAMT 2022 Marta Bañón, Miquel Esplà-Gomis, Mikel L. Forcada, Cristian García-Romero, Taja Kuzman, Nikola Ljubešić, Rik van Noord, Leopoldo Pla Sempere, Gema Ramírez-Sánchez, Peter Rupnik, Vít Suchomel, Antonio Toral, Tobias van der Werff, Jaume Zaragoza

We introduce the project “MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages”, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages.

Smart Bilingual Focused Crawling of Parallel Documents

no code implementations23 May 2024 Cristian García-Romero, Miquel Esplà-Gomis, Felipe Sánchez-Martínez

Crawling parallel texts $\unicode{x2014}$texts that are mutual translations$\unicode{x2014}$ from the Internet is usually done following a brute-force approach: documents are massively downloaded in an unguided process, and only a fraction of them end up leading to actual parallel content.

Cannot find the paper you are looking for? You can Submit a new open access paper.