1 code implementation • EMNLP (BlackboxNLP) 2021 • Radina Dobreva, Frank Keller
Pre-trained vision-and-language models have achieved impressive results on a variety of tasks, including ones that require complex reasoning beyond object recognition.
1 code implementation • 13 May 2024 • Davide Moltisanti, Hakan Bilen, Laura Sevilla-Lara, Frank Keller
We use our synthetic data to train a model based on UNet and test it on real images showing coarsely/finely cut objects.
1 code implementation • 4 Apr 2024 • Rohit Saxena, Frank Keller
Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models.
1 code implementation • 1 Mar 2024 • Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre, Frank Keller
We hypothesize that this is because explicit spatial relations rarely appear in the image captions used to train these models.
no code implementations • 27 Nov 2023 • Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller
The resulting dataset is three orders of magnitude smaller than current web-scale datasets but enables efficient training of large-scale models.
1 code implementation • 20 Oct 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i. e., a narration is paired with an image.
no code implementations • 8 Oct 2023 • Danyang Liu, Mirella Lapata, Frank Keller
Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.
1 code implementation • 11 Aug 2023 • Gautier Dagan, Frank Keller, Alex Lascarides
While Large Language Models (LLMs) can solve many NLP tasks in zero-shot settings, applications involving embodied agents remain problematic.
no code implementations • 24 May 2023 • Hanxu Hu, Frank Keller
Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets.
1 code implementation • 30 Mar 2023 • Danyang Liu, Frank Keller
Characters are essential to the plot of any story.
1 code implementation • CVPR 2023 • Davide Moltisanti, Frank Keller, Hakan Bilen, Laura Sevilla-Lara
The goal of this work is to understand the way actions are performed in videos.
Ranked #2 on Video-Adverb Retrieval on HowTo100M Adverbs
1 code implementation • 27 Jan 2023 • Gautier Dagan, Frank Keller, Alex Lascarides
However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal.
no code implementations • ICCV 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing.
no code implementations • 30 Sep 2022 • Anil Batra, Shreyank N Gowda, Frank Keller, Laura Sevilla-Lara
We refer to this task as Procedure Segmentation and Summarization (PSS).
no code implementations • 9 Jun 2022 • Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara
We propose to learn what makes a good video for action recognition and select only high-quality samples for augmentation.
Ranked #2 on Few Shot Action Recognition on HMDB51
no code implementations • CVPR 2022 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding.
no code implementations • 16 Nov 2021 • Pinelopi Papalampidi, Frank Keller, Mirella Lapata
Movie trailers perform multiple functions: they introduce viewers to the story, convey the mood and artistic style of the film, and encourage audiences to see the movie.
3 code implementations • 14 Sep 2021 • David Wilmot, Frank Keller
Recent language models can generate interesting and grammatically correct text in story generation but often lack plot development and long-term coherence.
1 code implementation • EMNLP 2021 • David Wilmot, Frank Keller
Measuring event salience is essential in the understanding of stories.
1 code implementation • 27 Jul 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Kiyoon Kim, Frank Keller, Marcus Rohrbach
We benchmark several recent approaches on the proposed True Zero-Shot(TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation.
no code implementations • 18 Jan 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Frank Keller, Marcus Rohrbach
Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes.
Ranked #2 on Zero-Shot Action Recognition on Olympics
1 code implementation • 14 Dec 2020 • Pinelopi Papalampidi, Frank Keller, Mirella Lapata
We summarize full-length movies by creating shorter videos containing their most informative scenes.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller
Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads.
1 code implementation • ACL 2020 • David Wilmot, Frank Keller
Suspense is a crucial ingredient of narrative fiction, engaging readers and making stories compelling.
2 code implementations • ACL 2020 • Pinelopi Papalampidi, Frank Keller, Lea Frermann, Mirella Lapata
Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront.
no code implementations • IJCNLP 2019 • Pinelopi Papalampidi, Frank Keller, Mirella Lapata
According to screenwriting theory, turning points (e. g., change of plans, major setback, climax) are crucial narrative moments within a screenplay: they define the plot structure, determine its progression and segment the screenplay into thematic units (e. g., setup, complications, aftermath).
1 code implementation • ACL 2019 • Bowen Li, Lili Mou, Frank Keller
In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by the PRPN to a Tree-LSTM model with discrete parsing actions.
1 code implementation • NAACL 2019 • Spandana Gella, Desmond Elliott, Frank Keller
We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9, 504 images annotated with English, German, and Spanish verbs.
no code implementations • 2 Feb 2019 • Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov
Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones.
no code implementations • 14 Nov 2018 • Bowen Li, Jianpeng Cheng, Yang Liu, Frank Keller
Transition-based models enable faster inference with $O(n)$ time complexity, but their performance still lags behind.
no code implementations • 31 Jul 2018 • Michael Hahn, Frank Keller
Research on human reading has long documented that reading behavior shows task-specific effects, but it has been challenging to build general models predicting what reading behavior humans will show in a given task.
no code implementations • NAACL 2018 • Sp Gella, ana, Frank Keller
Recent research in language and vision has developed models for predicting and disambiguating verbs from images.
no code implementations • ICCV 2017 • Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari
We crowd-source extreme point annotations for PASCAL VOC 2007 and 2012 and show that (1) annotation time is only 7s per box, 5x faster than the traditional way of drawing boxes [62]; (2) the quality of the boxes is as good as the original ground-truth drawn the traditional way; (3) detectors trained on our annotations are as accurate as those trained on the original ground-truth.
no code implementations • EMNLP 2017 • Spandana Gella, Rico Sennrich, Frank Keller, Mirella Lapata
In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding.
no code implementations • ACL 2017 • Spandana Gella, Frank Keller
A large amount of recent research has focused on tasks that combine language and vision, resulting in a proliferation of datasets and methods.
no code implementations • CVPR 2017 • Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari
Training object class detectors typically requires a large set of images with objects annotated by bounding boxes.
no code implementations • COLING 2016 • Maria Barrett, Frank Keller, Anders S{\o}gaard
Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models.
no code implementations • EMNLP 2016 • Michael Hahn, Frank Keller
When humans read text, they fixate some words and skip others.
1 code implementation • NAACL 2016 • Spandana Gella, Mirella Lapata, Frank Keller
We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i. e., the one that describes the action depicted in the image.
1 code implementation • CVPR 2016 • Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari
Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes.
no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.
no code implementations • TACL 2013 • Federico Sangati, Frank Keller
In this paper, we present the first incremental parser for Tree Substitution Grammar (TSG).