no code implementations • 2 Apr 2024 • Gaurish Thakkar, Sherzod Hakimov, Marko Tadić
In recent years, multimodal natural language processing, aimed at learning from diverse data types, has garnered significant attention.
no code implementations • 14 May 2023 • Gaurish Thakkar, Nives Mikelic Preradović, Marko Tadić
This article presents a sentence-level sentiment dataset for the Croatian news domain.
no code implementations • 14 May 2023 • Gaurish Thakkar, Nives Mikelic Preradovic, Marko Tadić
This paper introduces Cro-FiReDa, a sentiment- annotated dataset for Croatian in the domain of movie reviews.
no code implementations • 28 Feb 2023 • Simon Gottschalk, Endri Kacupaj, Sara Abdollahi, Diego Alves, Gabriel Amaral, Elisavet Koutsiana, Tin Kuculo, Daniela Major, Caio Mello, Gullal S. Cheema, Abdul Sittar, Swati, Golsa Tahmasebzadeh, Gaurish Thakkar
Accessing and understanding contemporary and historical events of global impact such as the US elections and the Olympic Games is a major prerequisite for cross-lingual event analytics that investigate event causes, perception and consequences across country borders.
no code implementations • 14 Dec 2022 • Jelena Sarajlić, Gaurish Thakkar, Diego Alves, Nives Mikelic Preradović
This paper presents a corpus annotated for the task of direct-speech extraction in Croatian.
no code implementations • 14 Dec 2022 • Diego Alves, Gaurish Thakkar, Marko Tadić
This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora.
1 code implementation • 14 Dec 2022 • Gaurish Thakkar, Nives Mikelic Preradovic, Marko Tadic
This paper presents a cross-lingual sentiment analysis of news articles using zero-shot and few-shot learning.
no code implementations • 14 Dec 2022 • Diego Alves, Gaurish Thakkar, Gabriel Amaral, Tin Kuculo, Marko Tadić
With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced-languages follows suit.
no code implementations • LREC 2020 • Diego Alves, Gaurish Thakkar, Marko Tadić
Due to the differences in terms of availability of language resources for each language, we have built this strategy in three steps, starting with processing chains for the well-resourced languages and finishing with the development of new modules for the under-resourced ones.
no code implementations • LREC 2020 • Diego Alves, Gaurish Thakkar, Marko Tadić
We considered the difference between reported and our tested results within a single percentage point as being within the limits of acceptable tolerance and thus consider this result as reproducible.
no code implementations • 23 Oct 2020 • Diego Alves, Tin Kuculo, Gabriel Amaral, Gaurish Thakkar, Marko Tadic
We introduce the Universal Named-Entity Recognition (UNER)framework, a 4-level classification hierarchy, and the methodology that isbeing adopted to create the first multilingual UNER corpus: the SETimesparallel corpus annotated for named-entities.
1 code implementation • 23 Oct 2020 • Gaurish Thakkar, Marcis Pinnis
In this paper, we present various pre-training strategies that aid in im-proving the accuracy of the sentiment classification task.