1 code implementation • 13 Oct 2021 • Matteo Romanello, Sven Najem-Meyer, Bruce Robertson
As part of this paper, we also release GT4HistComment, a small dataset with OCR ground truth for 19th classical commentaries and Pogretra, a large collection of training data and pre-trained models for a wide variety of ancient Greek typefaces.
Optical Character Recognition Optical Character Recognition (OCR)
1 code implementation • 23 Sep 2021 • Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet
After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts.
1 code implementation • 25 May 2020 • Marilena Daquino, Silvio Peroni, David Shotton, Giovanni Colavizza, Behnam Ghavimi, Anne Lauscher, Philipp Mayr, Matteo Romanello, Philipp Zumstein
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations.
Digital Libraries
no code implementations • LREC 2020 • Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Str{\"o}bel, Rapha{\"e}l Barman
If this represents a huge step forward in terms of preservation and accessibility, the next fundamental challenge{--} and real promise of digitization{--} is to exploit the contents of these digital assets, and therefore to adapt and develop appropriate language technologies to search and retrieve information from this {`}Big Data of the Past{'}.