Search Results for author: Thomas Proisl

Found 16 papers, 3 papers with code

A Corpus of German Reddit Exchanges (GeRedE)

no code implementations LREC 2020 Andreas Blombach, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Thomas Proisl

GeRedE is a 270 million token German CMC corpus containing approximately 380, 000 submissions and 6, 800, 000 comments posted on Reddit between 2010 and 2018.

EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus

no code implementations LREC 2020 Thomas Proisl, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Andreas Blombach, Stefan Evert

The EmpiriST corpus (Bei{\ss}wenger et al., 2016) is a manually tokenized and part-of-speech tagged corpus of approximately 23, 000 tokens of German Web and CMC (computer-mediated communication) data.

Lemmatization

Efficient Dependency Graph Matching with the IMS Open Corpus Workbench

no code implementations LREC 2012 Thomas Proisl, Peter Uhrig

State-of-the-art dependency representations such as the Stanford Typed Dependencies may represent the grammatical relations in a sentence as directed, possibly cyclic graphs.

Dependency Parsing Graph Matching +2

Cannot find the paper you are looking for? You can Submit a new open access paper.