no code implementations • CODI 2021 • Frances Yung, Merel Scholman, Vera Demberg
In the current contribution, we analyse whether a sophisticated connective generation module is necessary to select a connective, or whether this can be solved with simple methods (such as random choice between connectives that are known to express a given relation, or usage of a generic language model).
no code implementations • CODI 2021 • Marian Marchal, Merel Scholman, Vera Demberg
The lexicon shows that the majority of Nigerian Pidgin connectives are borrowed from its English lexifier, but that there are also some connectives that are unique to Nigerian Pidgin.
no code implementations • CODI 2021 • Merel Scholman, Tianai Dong, Frances Yung, Vera Demberg
Existing parse methods use varying approaches to identify explicit discourse connectives, but their performance has not been consistently evaluated in comparison to each other, nor have they been evaluated consistently on text other than newspaper articles.
no code implementations • LREC 2022 • Merel Scholman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty, Vera Demberg
The current contribution studies the effect of worker selection and training on the agreement on implicit relation labels between workers and gold labels, for both the DC and the QA method.
no code implementations • COLING (CODI, CRAC) 2022 • Frances Yung, Kaveri Anuranjana, Merel Scholman, Vera Demberg
Implicit discourse relations can convey more than one relation sense, but much of the research on discourse relations has focused on single relation senses.
Classification Implicit Discourse Relation Classification +1
1 code implementation • LREC 2022 • Merel Scholman, Tianai Dong, Frances Yung, Vera Demberg
Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into non-connective signals of discourse relations.
no code implementations • COLING 2022 • Marian Marchal, Merel Scholman, Frances Yung, Vera Demberg
In many linguistic fields requiring annotated data, multiple interpretations of a single item are possible.
no code implementations • 28 Apr 2024 • Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg
We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis.
no code implementations • 7 Feb 2024 • Frances Yung, Mansoor Ahmad, Merel Scholman, Vera Demberg
Pre-trained large language models, such as ChatGPT, archive outstanding performance in various reasoning tasks without supervised training and were found to have outperformed crowdsourcing workers.
Classification Implicit Discourse Relation Classification +3
1 code implementation • 1 Jul 2023 • Pin-Jie Lin, Muhammed Saeed, Ernie Chang, Merel Scholman
In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages.
no code implementations • WS 2019 • Frances Yung, Vera Demberg, Merel Scholman
The perspective of being able to crowd-source coherence relations bears the promise of acquiring annotations for new texts quickly, which could then increase the size and variety of discourse-annotated corpora.
no code implementations • 28 Apr 2017 • Vera Demberg, Fatemeh Torabi Asr, Merel Scholman
Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks.
no code implementations • WS 2017 • Merel Scholman, Vera Demberg
In this paper, we investigate whether crowdsourcing can be used to obtain reliable discourse relation annotations.
no code implementations • LREC 2016 • Ines Rehbein, Merel Scholman, Vera Demberg
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data.