Building the Macedonian-Croatian Parallel Corpus

LREC 2016 · Ines Cebovi{\'c}, Marko Tadi{\'c} ·

In this paper we present the newly created parallel corpus of two under-resourced languages, namely, Macedonian-Croatian Parallel Corpus (mk-hr{\_}pcorp) that has been collected during 2015 at the Faculty of Humanities and Social Sciences, University of Zagreb. The mk-hr{\_}pcorp is a unidirectional (mkâ†’hr) parallel corpus composed of synchronic fictional prose texts received already in digital form with over 500 Kw in each language. The corpus was sentence segmented and provides 39,735 aligned sentences. The alignment was done automatically and then post-corrected manually. The alignments order was shuffled and this enabled the corpus to be available under CC-BY license through META-SHARE. However, this prevents the research in language units over the sentence level.

PDF Abstract LREC 2016 PDF LREC 2016 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Sentence

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Building the Macedonian-Croatian Parallel Corpus

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove