no code implementations • EMNLP (BlackboxNLP) 2021 • Michael Hanna, David Mareček
The high performance of large pretrained language models (LLMs) such as BERT on NLP tasks has prompted questions about BERT’s linguistic capabilities, and how they differ from humans’.
no code implementations • NAACL (WNU) 2022 • Rudolf Rosa, Patrícia Schmidtová, Ondřej Dušek, Tomáš Musil, David Mareček, Saad Obaid, Marie Nováková, Klára Vosecká, Josef Doležal
We experiment with adapting generative language models for the generation of long coherent narratives in the form of theatre plays.
no code implementations • NAACL (GeBNLP) 2022 • Tomasz Limisiewicz, David Mareček
The representations in large language models contain multiple types of gender information.
1 code implementation • 29 Oct 2023 • Tomasz Limisiewicz, David Mareček, Tomáš Musil
Large language models are becoming the go-to solution for the ever-growing number of tasks.
1 code implementation • 21 Sep 2023 • Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, David Mareček
We study the effect of tokenization on gender bias in machine translation, an aspect that has been largely overlooked in previous works.
no code implementations • 20 Jun 2023 • Linus Pithan, Vladimir Starostin, David Mareček, Lukas Petersdorf, Constantin Völter, Valentin Munteanu, Maciej Jankowski, Oleg Konovalov, Alexander Gerlach, Alexander Hinderhofer, Bridget Murphy, Stefan Kowarik, Frank Schreiber
Our focus lies on the beamline integration of ML-based online data analysis and closed-loop feedback.
1 code implementation • 26 May 2023 • Tomasz Limisiewicz, Jiří Balhar, David Mareček
Multilingual language models have recently gained attention as a promising solution for representing multiple languages in a single model.
no code implementations • 19 Dec 2022 • Tomáš Musil, David Mareček
Independent Component Analysis (ICA) is an algorithm originally developed for finding separate sources in a mixed signal, such as a recording of multiple people in the same room speaking at the same time.
no code implementations • 21 Jun 2022 • Tomasz Limisiewicz, David Mareček
The representations in large language models contain multiple types of gender information.
no code implementations • EMNLP 2021 • Tomasz Limisiewicz, David Mareček
The evaluated information is encoded in a shared cross-lingual embedding space.
no code implementations • 17 Feb 2021 • Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka
We present the first version of a system for interactive generation of theatre play scripts.
1 code implementation • ACL 2021 • Tomasz Limisiewicz, David Mareček
With the recent success of pre-trained models in NLP, a significant focus was put on interpreting their representations.
no code implementations • 2 Oct 2020 • Tomasz Limisiewicz, David Mareček
Neural networks trained on natural language processing tasks capture syntax even though it is not provided as a supervision signal.
no code implementations • 29 Jun 2020 • Rudolf Rosa, Tomáš Musil, David Mareček
In classical probing, a classifier is trained on the representations to extract the target linguistic information.
no code implementations • 25 Jun 2020 • Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká
We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Tomasz Limisiewicz, Rudolf Rosa, David Mareček
This work focuses on analyzing the form and extent of syntactic abstraction captured by BERT by extracting labeled dependency trees from self-attentions.
no code implementations • 27 Jun 2019 • Rudolf Rosa, David Mareček
We use the English model of BERT and explore how a deletion of one word in a sentence changes representations of other words.
no code implementations • 6 Jun 2019 • Tomáš Musil, Jonáš Vidra, David Mareček
Derivation is a type of a word-formation process which creates new words from existing ones by adding, changing or deleting affixes.
no code implementations • WS 2019 • David Mareček, Rudolf Rosa
We inspect the multi-head self-attention in Transformer NMT encoders for three source languages, looking for patterns that could have a syntactic interpretation.
no code implementations • 12 Nov 2018 • Jindřich Libovický, Jindřich Helcl, David Mareček
In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways.