Search Results for author: Jan Hajič

Found 11 papers, 3 papers with code

TectoMT – a deep linguistic core of the combined Cimera MT system

no code implementations • EAMT 2016 • Martin Popel, Roman Sudarikov, Ondřej Bojar, Rudolf Rosa, Jan Hajič

Paper
Add Code

Quality and Efficiency of Manual Annotation: Pre-annotation Bias

no code implementations • LREC 2022 • Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková, Jan Hajič

This paper presents an analysis of annotation using an automatic pre-annotation for a mid-level annotation complexity task -- dependency syntax annotation.

Paper
Add Code

Extending an Event-type Ontology: Adding Verbs and Classes Using Fine-tuned LLMs Suggestions

1 code implementation • 3 Jun 2023 • Jana Straková, Eva Fučíková, Jan Hajič, Zdeňka Urešová

We have also carefully examined the correlation of the automatic scores with the human annotation.

Descriptive

Paper
Code

Prague Dependency Treebank -- Consolidated 1.0

no code implementations • 5 Jun 2020 • Jan Hajič, Eduard Bejček, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1. 0 (PDT-C 1. 0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research.

Translation

Paper
Add Code

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

no code implementations • LREC 2020 • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework.

Paper
Add Code

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

no code implementations • LREC 2020 • Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin, Walter Daelemans, Koenraad De Smedt, Radovan Garabík, Maria Gavriilidou, Dagmar Gromann, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lindén, Bernardo Magnini, Jan Odijk, Maciej Ogrodniczuk, Eiríkur Rögnvaldsson, Mike Rosner, Bolette Sandford Pedersen, Inguna Skadiņa, Marko Tadić, Dan Tufiş, Tamás Váradi, Kadri Vider, Andy Way, François Yvon

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality.

Misconceptions

Paper
Add Code

Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER

no code implementations • 8 Sep 2019 • Milan Straka, Jana Straková, Jan Hajič

We evaluate two meth ods for precomputing such embeddings, BERT and Flair, on four Czech text processing tasks: part-of-speech (POS) tagging, lemmatization, dependency pars ing and named entity recognition (NER).

Dependency Parsing Lemmatization +6

Paper
Add Code

Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing

no code implementations • 20 Aug 2019 • Milan Straka, Jana Straková, Jan Hajič

We present an extensive evaluation of three recently proposed methods for contextualized embeddings on 89 corpora in 54 languages of the Universal Dependencies 2. 3 in three tasks: POS tagging, lemmatization, and dependency parsing.

Ranked #1 on Dependency Parsing on Universal Dependencies

Dependency Parsing Lemmatization +3

Paper
Add Code

Neural Architectures for Nested NER through Linearization

1 code implementation • ACL 2019 • Jana Straková, Milan Straka, Jan Hajič

We propose two neural network architectures for nested named entity recognition (NER), a setting in which named entities may overlap and also be labeled with more than one label.

Ranked #3 on Nested Mention Recognition on ACE 2005

Hard Attention named-entity-recognition +4

Paper
Code

UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging

no code implementations • WS 2019 • Milan Straka, Jana Straková, Jan Hajič

In the morphological analysis, our system placed tightly second: our morphological analysis accuracy was 93. 19, the winning system's 93. 23.

Lemmatization Morphological Analysis +1

Paper
Add Code

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

2 code implementations • 10 Aug 2018 • Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka, Jan Hajič

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

Lemmatization Part-Of-Speech Tagging +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.