Search Results for author: Itziar Gonzalez-Dios

Found 23 papers, 4 papers with code

A Syntax-Aware Edit-based System for Text Simplification

no code implementations • RANLP 2021 • Oscar M. Cumbicus-Pineda, Itziar Gonzalez-Dios, Aitor Soroa

Edit-based text simplification systems have attained much attention in recent years due to their ability to produce simplification solutions that are interpretable, as well as requiring less training examples compared to traditional seq2seq systems.

Sentence Text Simplification

Paper
Add Code

What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus

no code implementations • EACL (GWC) 2021 • Cecilia Domingo, Tatiana Gonzalez-Ferrero, Itziar Gonzalez-Dios

Natural Language Processing tools and resources have been so far mainly created and trained for standard varieties of language.

Paper
Add Code

Textual genre based approach to use WordNet in language-for-specific-purpose classroom as dictionary

no code implementations • GWC 2019 • Itziar Gonzalez-Dios

When teaching language for specific purposes (LSP) linguistic resources are needed to help students understand and write specialised texts.

Paper
Add Code

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

1 code implementation • 24 Oct 2023 • Iker García-Ferrero, Begoña Altuna, Javier Álvez, Itziar Gonzalez-Dios, German Rigau

We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained.

Ranked #1 on Zero-Shot Text Classification on This is not a Dataset

Descriptive Negation +2

Paper
Code

Easy-to-Read in Germany: A Survey on its Current State and Available Resources

no code implementations • 5 Jun 2023 • Margot Madina, Itziar Gonzalez-Dios, Melanie Siegel

Plain Language (PL), on the other hand, is a variant of a given language, which aims to promote the use of simple language to communicate information.

Paper
Add Code

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

no code implementations • 7 Mar 2023 • Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gerard Dupont, Stella Biderman, Anna Rogers, Loubna Ben allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, Shayne Longpre, Sebastian Nagel, Leon Weber, Manuel Muñoz, Jian Zhu, Daniel van Strien, Zaid Alyafeai, Khalid Almubarak, Minh Chien Vu, Itziar Gonzalez-Dios, Aitor Soroa, Kyle Lo, Manan Dey, Pedro Ortiz Suarez, Aaron Gokaslan, Shamik Bose, David Adelani, Long Phan, Hieu Tran, Ian Yu, Suhas Pai, Jenny Chim, Violette Lepercq, Suzana Ilic, Margaret Mitchell, Sasha Alexandra Luccioni, Yacine Jernite

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings.

Ethics Language Modelling

Paper
Add Code

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

6 code implementations • 9 Nov 2022 • BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Decoder Language Modelling +1

2,206

Paper
Code

Noisy Channel for Automatic Text Simplification

no code implementations • 6 Nov 2022 • Oscar M Cumbicus-Pineda, Iker Gutiérrez-Fandiño, Itziar Gonzalez-Dios, Aitor Soroa

In this paper we present a simple re-ranking method for Automatic Sentence Simplification based on the noisy channel scheme.

Language Modelling Re-Ranking +2

Paper
Add Code

Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning

1 code implementation • Findings (NAACL) 2022 • Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre

In this work we show that entailment is also effective in Event Argument Extraction (EAE), reducing the need of manual annotation to 50% and 20% in ACE and WikiEvents respectively, while achieving the same performance as with full training.

Ranked #1 on Event Argument Extraction on WikiEvents

Event Argument Extraction Natural Language Inference +2

148

Paper
Code

MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment

1 code implementation • 10 Sep 2021 • Kepa Bengoetxea, Itziar Gonzalez-Dios

In this paper, we present the MultiAzterTest tool: (i) an open source NLP tool which analyzes texts on over 125 measures of cohesion, language, and readability for English, Spanish and Basque, but whose architecture is designed to easily adapt other languages; (ii) readability assessment classifiers that improve the performance of Coh-Metrix in English, Coh-Metrix-Esp in Spanish and ErreXail in Basque; iii) a web tool.

Paper
Code

Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets

no code implementations • 1 Jul 2021 • Xavier Gómez Guinovart, Itziar Gonzalez-Dios, Antoni Oliver, German Rigau

Language resources are necessary for language processing, but building them is costly, involves many researches from different areas and needs constant updating.

Paper
Add Code

Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon

no code implementations • LREC 2020 • Itziar Gonzalez-Dios, Jon Alkorta

Wordnets are lexical databases where the semantic relations of words and concepts are established.

Machine Translation POS +4

Paper
Add Code

LagunTest: A NLP Based Application to Enhance Reading Comprehension

no code implementations • LREC 2020 • Kepa Bengoetxea, Itziar Gonzalez-Dios, Amaia Aguirregoitia

The ability to read and understand written texts plays an important role in education, above all in the last years of primary education.

Reading Comprehension

Paper
Add Code

Towards modelling SUMO attributes through WordNet adjectives: a Case Study on Qualities

no code implementations • LREC 2020 • Itziar Gonzalez-Dios, Javier Alvez, German Rigau

In this context, we propose a new semi-automatic approach to model the knowledge about properties and attributes in SUMO by exploiting the information encoded in WordNet adjectives and its mapping to SUMO.

Paper
Add Code

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis

no code implementations • GWC 2019 • Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.

Paper
Add Code

Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning

no code implementations • 14 Aug 2018 • Javier Álvez, Itziar Gonzalez-Dios, German Rigau

In this paper, we investigate the application of the Closed World Assumption (CWA) to enable a better exploitation of FOL ontologies by using state-of-the-art automated theorem provers.

Translation

Paper
Add Code

Verbal Multiword Expressions in Basque Corpora

no code implementations • COLING 2018 • Uxoa I{\~n}urrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, I{\~n}aki Alegria

This paper presents a Basque corpus where Verbal Multiword Expressions (VMWEs) were annotated following universal guidelines.

Paper
Add Code

Validating WordNet Meronymy Relations using Adimen-SUMO

no code implementations • 20 May 2018 • Javier Álvez, Itziar Gonzalez-Dios, German Rigau

In this paper, we report on the practical application of a novel approach for validating the knowledge of WordNet using Adimen-SUMO.

Paper
Add Code

Cross-checking WordNet and SUMO Using Meronymy

no code implementations • LREC 2018 • Javier {\'A}lvez, Itziar Gonzalez-Dios, German Rigau

Automated Theorem Proving

Paper
Add Code

Framework for the Analysis of Simplified Texts Taking Discourse into Account: the Basque Causal Relations as Case Study

no code implementations • WS 2017 • Itziar Gonzalez-Dios, Arantza Diaz de Ilarraza, Mikel Iruskieta

Text Simplification

Paper
Add Code

A Preliminary Study of Statistically Predictive Syntactic Complexity Features and Manual Simplifications in Basque

no code implementations • WS 2016 • Itziar Gonzalez-Dios, Mar{\'\i}a Jes{\'u}s Aranzabe, Arantza D{\'\i}az de Ilarraza

In this paper, we present a comparative analysis of statistically predictive syntactic features of complexity and the treatment of these features by humans when simplifying texts.

Text Simplification

Paper
Add Code

Making Biographical Data in Wikipedia Readable: A Pattern-based Multilingual Approach

no code implementations • WS 2014 • Itziar Gonzalez-Dios, Mar{\'\i}a Jes{\'u}s Aranzabe, Arantza D{\'\i}az de Ilarraza

Text Simplification

Paper
Add Code

Simple or Complex? Assessing the readability of Basque Texts

no code implementations • COLING 2014 • Itziar Gonzalez-Dios, Mar{\'\i}a Jes{\'u}s Aranzabe, Arantza D{\'\i}az de Ilarraza, Haritz Salaberri

Text Simplification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.