Search Results for author: Frank Keller

Found 50 papers, 19 papers with code

Investigating Negation in Pre-trained Vision-and-language Models

1 code implementation • EMNLP (BlackboxNLP) 2021 • Radina Dobreva, Frank Keller

Pre-trained vision-and-language models have achieved impressive results on a variety of tasks, including ones that require complex reasoning beyond object recognition.

Negation Object Recognition

Paper
Code

Coarse or Fine? Recognising Action End States without Labels

1 code implementation • 13 May 2024 • Davide Moltisanti, Hakan Bilen, Laura Sevilla-Lara, Frank Keller

We use our synthetic data to train a model based on UNet and test it on real images showing coarsely/finely cut objects.

Action Recognition Object

Paper
Code

Select and Summarize: Scene Saliency for Movie Script Summarization

1 code implementation • 4 Apr 2024 • Rohit Saxena, Frank Keller

Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models.

Abstractive Text Summarization

Paper
Code

Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset

1 code implementation • 1 Mar 2024 • Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre, Frank Keller

We hypothesize that this is because explicit spatial relations rarely appear in the image captions used to train these models.

Image Captioning Text-to-Image Generation

Paper
Code

Efficient Pre-training for Localized Instruction Generation of Videos

no code implementations • 27 Nov 2023 • Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller

The resulting dataset is three orders of magnitude smaller than current web-scale datasets but enables efficient training of large-scale models.

Paper
Add Code

Semi-supervised multimodal coreference resolution in image narrations

1 code implementation • 20 Oct 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i. e., a narration is paired with an image.

coreference-resolution Descriptive

Paper
Code

Visual Storytelling with Question-Answer Plans

no code implementations • 8 Oct 2023 • Danyang Liu, Mirella Lapata, Frank Keller

Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.

Visual Storytelling

Paper
Add Code

Dynamic Planning with a LLM

1 code implementation • 11 Aug 2023 • Gautier Dagan, Frank Keller, Alex Lascarides

While Large Language Models (LLMs) can solve many NLP tasks in zero-shot settings, applications involving embodied agents remain problematic.

Paper
Code

Meta-learning For Vision-and-language Cross-lingual Transfer

no code implementations • 24 May 2023 • Hanxu Hu, Frank Keller

Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets.

Cross-Lingual Transfer Meta-Learning

Paper
Add Code

Detecting and Grounding Important Characters in Visual Stories

1 code implementation • 30 Mar 2023 • Danyang Liu, Frank Keller

Characters are essential to the plot of any story.

Visual Storytelling

Paper
Code

Learning Action Changes by Measuring Verb-Adverb Textual Relationships

1 code implementation • CVPR 2023 • Davide Moltisanti, Frank Keller, Hakan Bilen, Laura Sevilla-Lara

The goal of this work is to understand the way actions are performed in videos.

Ranked #2 on Video-Adverb Retrieval on HowTo100M Adverbs

Video-Adverb Retrieval

Paper
Code

Learning the Effects of Physical Actions in a Multi-modal Environment

1 code implementation • 27 Jan 2023 • Gautier Dagan, Frank Keller, Alex Lascarides

However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal.

Physical Commonsense Reasoning

Paper
Code

Who are you referring to? Coreference resolution in image narrations

no code implementations • ICCV 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing.

coreference-resolution

Paper
Add Code

A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos

no code implementations • 30 Sep 2022 • Anil Batra, Shreyank N Gowda, Frank Keller, Laura Sevilla-Lara

We refer to this task as Procedure Segmentation and Summarization (PSS).

Dense Video Captioning Segmentation

Paper
Add Code

Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition

no code implementations • 9 Jun 2022 • Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara

We propose to learn what makes a good video for action recognition and select only high-quality samples for augmentation.

Ranked #2 on Few Shot Action Recognition on HMDB51

Data Augmentation Few Shot Action Recognition +1

Paper
Add Code

Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation

no code implementations • CVPR 2022 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding.

Graph Generation Informativeness +2

Paper
Add Code

Film Trailer Generation via Task Decomposition

no code implementations • 16 Nov 2021 • Pinelopi Papalampidi, Frank Keller, Mirella Lapata

Movie trailers perform multiple functions: they introduce viewers to the story, convey the mood and artistic style of the film, and encourage audiences to see the movie.

Paper
Add Code

A Temporal Variational Model for Story Generation

3 code implementations • 14 Sep 2021 • David Wilmot, Frank Keller

Recent language models can generate interesting and grammatically correct text in story generation but often lack plot development and long-term coherence.

Story Generation

Paper
Code

Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories

1 code implementation • EMNLP 2021 • David Wilmot, Frank Keller

Measuring event salience is essential in the understanding of stories.

Language Modelling Retrieval

Paper
Code

A New Split for Evaluating True Zero-Shot Action Recognition

1 code implementation • 27 Jul 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Kiyoon Kim, Frank Keller, Marcus Rohrbach

We benchmark several recent approaches on the proposed True Zero-Shot(TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation.

Few-Shot action recognition Few Shot Action Recognition +2

Paper
Code

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

no code implementations • 18 Jan 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Frank Keller, Marcus Rohrbach

Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes.

Ranked #2 on Zero-Shot Action Recognition on Olympics

Action Recognition Clustering +4

Paper
Add Code

Movie Summarization via Sparse Graph Construction

1 code implementation • 14 Dec 2020 • Pinelopi Papalampidi, Frank Keller, Mirella Lapata

We summarize full-length movies by creating shorter videos containing their most informative scenes.

graph construction Turning Point Identification +1

Paper
Code

Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller

Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads.

Constituency Parsing

Paper
Add Code

Modelling Suspense in Short Stories as Uncertainty Reduction over Neural Representation

1 code implementation • ACL 2020 • David Wilmot, Frank Keller

Suspense is a crucial ingredient of narrative fiction, engaging readers and making stories compelling.

Language Modelling

Paper
Code

Screenplay Summarization Using Latent Narrative Structure

2 code implementations • ACL 2020 • Pinelopi Papalampidi, Frank Keller, Lea Frermann, Mirella Lapata

Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront.

Document Summarization Extractive Summarization +3

Paper
Code

Movie Plot Analysis via Turning Point Identification

no code implementations • IJCNLP 2019 • Pinelopi Papalampidi, Frank Keller, Mirella Lapata

According to screenwriting theory, turning points (e. g., change of plans, major setback, climax) are crucial narrative moments within a screenplay: they define the plot structure, determine its progression and segment the screenplay into thematic units (e. g., setup, complications, aftermath).

Position Sentence +1

Paper
Add Code

An Imitation Learning Approach to Unsupervised Parsing

1 code implementation • ACL 2019 • Bowen Li, Lili Mou, Frank Keller

In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by the PRPN to a Tree-LSTM model with discrete parsing actions.

Imitation Learning Language Modelling +1

Paper
Code

Cross-lingual Visual Verb Sense Disambiguation

1 code implementation • NAACL 2019 • Spandana Gella, Desmond Elliott, Frank Keller

We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9, 504 images annotated with English, German, and Spanish verbs.

Machine Translation Translation

Paper
Code

Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error

no code implementations • 2 Feb 2019 • Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones.

Paper
Add Code

Dependency Grammar Induction with a Neural Variational Transition-based Parser

no code implementations • 14 Nov 2018 • Bowen Li, Jianpeng Cheng, Yang Liu, Frank Keller

Transition-based models enable faster inference with $O(n)$ time complexity, but their performance still lags behind.

Dependency Grammar Induction Variational Inference

Paper
Add Code

Modeling Task Effects in Human Reading with Neural Network-based Attention

no code implementations • 31 Jul 2018 • Michael Hahn, Frank Keller

Research on human reading has long documented that reading behavior shows task-specific effects, but it has been challenging to build general models predicting what reading behavior humans will show in a given task.

Question Answering Reading Comprehension

Paper
Add Code

An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data

no code implementations • NAACL 2018 • Sp Gella, ana, Frank Keller

Recent research in language and vision has developed models for predicting and disambiguating verbs from images.

General Classification Question Answering +2

Paper
Add Code

Extreme clicking for efficient object annotation

no code implementations • ICCV 2017 • Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

We crowd-source extreme point annotations for PASCAL VOC 2007 and 2012 and show that (1) annotation time is only 7s per box, 5x faster than the traditional way of drawing boxes [62]; (2) the quality of the boxes is as good as the original ground-truth drawn the traditional way; (3) detectors trained on our annotations are as accurate as those trained on the original ground-truth.

Object

Paper
Add Code

Image Pivoting for Learning Multilingual Multimodal Representations

no code implementations • EMNLP 2017 • Spandana Gella, Rico Sennrich, Frank Keller, Mirella Lapata

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding.

Image Retrieval Semantic Textual Similarity

Paper
Add Code

An Analysis of Action Recognition Datasets for Language and Vision Tasks

no code implementations • ACL 2017 • Spandana Gella, Frank Keller

A large amount of recent research has focused on tasks that combine language and vision, resulting in a proliferation of datasets and methods.

Action Recognition Image Retrieval +2

Paper
Add Code

Training object class detectors with click supervision

no code implementations • CVPR 2017 • Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

Training object class detectors typically requires a large set of images with objects annotated by bounding boxes.

Multiple Instance Learning Object +1

Paper
Add Code

Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features

no code implementations • COLING 2016 • Maria Barrett, Frank Keller, Anders S{\o}gaard

Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models.

Cross-Lingual Transfer POS +2

Paper
Add Code

Modeling Human Reading with Neural Attention

no code implementations • EMNLP 2016 • Michael Hahn, Frank Keller

When humans read text, they fixate some words and skip others.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data

no code implementations • ACL 2016 • Maria Barrett, Joachim Bingel, Frank Keller, Anders S{\o}gaard

Part-Of-Speech Tagging

Paper
Add Code

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

1 code implementation • NAACL 2016 • Spandana Gella, Mirella Lapata, Frank Keller

We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i. e., the one that describes the action depicted in the image.

Image Retrieval Retrieval +1

Paper
Code

We don't need no bounding-boxes: Training object class detectors using only human verification

1 code implementation • CVPR 2016 • Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes.

796

Paper
Code

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.

Retrieval