Search Results for author: Makarand Tapaswi

Found 35 papers, 20 papers with code

FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos

no code implementations • 15 Jan 2024 • Darshan Singh S, Zeeshan Khan, Makarand Tapaswi

We use the SRL and verb information to create rule-based detailed captions, making sure they capture most of the visual concepts.

Paper
Add Code

Eye vs. AI: Human Gaze and Model Attention in Video Memorability

no code implementations • 26 Nov 2023 • Prajneya Kumar, Eshika Khandelwal, Makarand Tapaswi, Vishnu Sreekumar

Understanding the factors that determine video memorability has important applications in areas such as educational technology and advertising.

Panoptic Segmentation

Paper
Add Code

Generalized Cross-domain Multi-label Few-shot Learning for Chest X-rays

no code implementations • 8 Sep 2023 • Aroof Aimen, Arsh Verma, Makarand Tapaswi, Narayanan C. Krishnan

Real-world application of chest X-ray abnormality classification requires dealing with several challenges: (i) limited training data; (ii) training and evaluation sets that are derived from different domains; and (iii) classes that appear during training may have partial overlap with classes of interest during evaluation.

Few-Shot Learning Transfer Learning

Paper
Add Code

How you feelin'? Learning Emotions and Mental States in Movie Scenes

1 code implementation • CVPR 2023 • Dhruv Srivastava, Aditya Kumar Singh, Makarand Tapaswi

Towards this goal, we formulate emotion understanding as predicting a diverse and multi-label set of emotions at the level of a movie scene and for each character.

Emotion Recognition Multi-Label Classification

Paper
Code

GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering

no code implementations • 22 Mar 2023 • Dhaval Taunk, Lakshya Khanna, Pavan Kandru, Vasudeva Varma, Charu Sharma, Makarand Tapaswi

Commonsense question-answering (QA) methods combine the power of pre-trained Language Models (LM) with the reasoning provided by Knowledge Graphs (KG).

Ranked #8 on Question Answering on OpenBookQA

Common Sense Reasoning Knowledge Graphs +1

Paper
Add Code

Test of Time: Instilling Video-Language Models with a Sense of Time

1 code implementation • CVPR 2023 • Piyush Bagad, Makarand Tapaswi, Cees G. M. Snoek

Our work serves as a first step towards probing and instilling a sense of time in existing video-language models without the need for data and compute-intense training from scratch.

Ranked #1 on Video-Text Retrieval on Test-of-Time (using extra training data)

Video-Text Retrieval Video Understanding

Paper
Code

Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations

no code implementations • 2 Dec 2022 • Jaidev Shriram, Makarand Tapaswi, Vinoo Alluri

Reading, much like music listening, is an immersive experience that transports readers while taking them on an emotional journey.

Paper
Add Code

Can we Adopt Self-supervised Pretraining for Chest X-Rays?

no code implementations • 23 Nov 2022 • Arsh Verma, Makarand Tapaswi

Chest radiograph (or Chest X-Ray, CXR) is a popular medical imaging modality that is used by radiologists across the world to diagnose heart or lung conditions.

Paper
Add Code

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

1 code implementation • 17 Nov 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

In this work we propose a language-conditioned transformer model for grounding 3D objects and their spatial relations.

Object Relation

Paper
Code

Unsupervised Audio-Visual Lecture Segmentation

no code implementations • 29 Oct 2022 • Darshan Singh S, Anchit Gupta, C. V. Jawahar, Makarand Tapaswi

We formulate lecture segmentation as an unsupervised task that leverages visual, textual, and OCR cues from the lecture, while clip representations are fine-tuned on a pretext self-supervised task of matching the narration with the temporally aligned visual content.

Navigate Optical Character Recognition (OCR) +1

Paper
Add Code

Grounded Video Situation Recognition

no code implementations • 19 Oct 2022 • Zeeshan Khan, C. V. Jawahar, Makarand Tapaswi

Recently, Video Situation Recognition (VidSitu) is framed as a task for structured prediction of multiple events, their relationships, and actions and various verb-role pairs attached to descriptive entities.

Descriptive Structured Prediction +1

Paper
Add Code

Instruction-driven history-aware policies for robotic manipulations

2 code implementations • 11 Sep 2022 • Pierre-Louis Guhur, ShiZhe Chen, Ricardo Garcia, Makarand Tapaswi, Ivan Laptev, Cordelia Schmid

In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions.

Ranked #2 on Robot Manipulation on RLBench (Succ. Rate (10 tasks, 100 demos/task) metric)

Robot Manipulation

Paper
Code

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation

1 code implementation • 24 Aug 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

Our resulting HM3D-AutoVLN dataset is an order of magnitude larger than existing VLN datasets in terms of navigation environments and instructions.

Ranked #1 on Visual Navigation on SOON Test

Language Modelling Navigate +3

Paper
Code

Learning Object Manipulation Skills from Video via Approximate Differentiable Physics

2 code implementations • 3 Aug 2022 • Vladimir Petrik, Mohammad Nomaan Qureshi, Josef Sivic, Makarand Tapaswi

We evaluate our approach on a 3D reconstruction task that consists of 54 video demonstrations sourced from 9 actions such as pull something from right to left or put something in front of something.

3D Reconstruction Friction +1

Paper
Code

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • ShiZhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev

To balance the complexity of large action space reasoning and fine-grained language grounding, we dynamically combine a fine-scale encoding over local observations and a coarse-scale encoding on a global map via graph transformers.

Ranked #4 on Visual Navigation on SOON Test

Efficient Exploration Navigate +2

Paper
Code

Feature Generation for Long-tail Classification

1 code implementation • 10 Nov 2021 • Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi

Oversampling instances of the tail classes attempts to solve this imbalance.

Ranked #1 on Long-tail Learning on mini-ImageNet-LT

Classification Few-Shot Learning +3

Paper
Code

Airbert: In-domain Pretraining for Vision-and-Language Navigation

2 code implementations • ICCV 2021 • Pierre-Louis Guhur, Makarand Tapaswi, ShiZhe Chen, Ivan Laptev, Cordelia Schmid

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

Ranked #3 on Vision and Language Navigation on VLN Challenge

Navigate Referring Expression +1

Paper
Code

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos

1 code implementation • 13 Nov 2020 • Vladimír Petrík, Makarand Tapaswi, Ivan Laptev, Josef Sivic

We evaluate our method on simple single- and two-object actions from the Something-Something dataset.

Object

Paper
Code

Clustering based Contrastive Learning for Improving Face Representations

no code implementations • 5 Apr 2020 • Vivek Sharma, Makarand Tapaswi, M. Saquib Sarfraz, Rainer Stiefelhagen

We demonstrate our method on the challenging task of learning representations for video face clustering.

Clustering Contrastive Learning +2

Paper
Add Code

Deep Multimodal Feature Encoding for Video Ordering

1 code implementation • 5 Apr 2020 • Vivek Sharma, Makarand Tapaswi, Rainer Stiefelhagen

True understanding of videos comes from a joint analysis of all its modalities: the video frames, the audio track, and any accompanying text such as closed captions.

Action Recognition

Paper
Code

Learning Interactions and Relationships between Movie Characters

1 code implementation • CVPR 2020 • Anna Kukleva, Makarand Tapaswi, Ivan Laptev

Localizing the pair of interacting characters in video is a time-consuming process, instead, we train our model to learn from clip-level weak labels.

Paper
Code

The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries

1 code implementation • 30 Dec 2019 • Atef Chaudhury, Makarand Tapaswi, Seung Wook Kim, Sanja Fidler

Understanding stories is a challenging reading comprehension problem for machines as it requires reading a large volume of text and following long-range dependencies.

Abstractive Text Summarization Question Answering +1

Paper
Code

Video Face Clustering with Unknown Number of Clusters

1 code implementation • ICCV 2019 • Makarand Tapaswi, Marc T. Law, Sanja Fidler

Understanding videos such as TV series and movies requires analyzing who the characters are and what they are doing.

Clustering Face Clustering +1

Paper
Code

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

4 code implementations • ICCV 2019 • Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic

In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.

Ranked #4 on Temporal Action Localization on CrossTask

Action Localization Long Video Retrieval (Background Removed) +3

207

Paper
Code

Self-Supervised Learning of Face Representations for Video Face Clustering

1 code implementation • 3 Mar 2019 • Vivek Sharma, Makarand Tapaswi, M. Saquib Sarfraz, Rainer Stiefelhagen

In this paper, we address video face clustering using unsupervised methods.

Clustering Face Clustering +1

Paper
Code

Visual Reasoning by Progressive Module Networks

1 code implementation • ICLR 2019 • Seung Wook Kim, Makarand Tapaswi, Sanja Fidler

Thus, a module for a new task learns to query existing modules and composes their outputs in order to produce its own output.

Visual Reasoning

Paper
Code

Now You Shake Me: Towards Automatic 4D Cinema

no code implementations • CVPR 2018 • Yuhao Zhou, Makarand Tapaswi, Sanja Fidler

We are interested in enabling automatic 4D cinema by parsing physical and special effects from untrimmed movies.

Paper
Add Code

MovieGraphs: Towards Understanding Human-Centric Situations from Videos

no code implementations • CVPR 2018 • Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler

Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed, graph-based annotations of social situations depicted in movie clips.

Common Sense Reasoning

Paper
Add Code

Situation Recognition with Graph Neural Networks

1 code implementation • ICCV 2017 • Ruiyu Li, Makarand Tapaswi, Renjie Liao, Jiaya Jia, Raquel Urtasun, Sanja Fidler

We address the problem of recognizing situations in images.

Ranked #9 on Situation Recognition on imSitu

Grounded Situation Recognition

Paper
Code

Relaxed Earth Mover's Distances for Chain- and Tree-connected Spaces and their use as a Loss Function in Deep Learning

no code implementations • 22 Nov 2016 • Manuel Martinez, Monica Haurilet, Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen

The Earth Mover's Distance (EMD) computes the optimal cost of transforming one distribution into another, given a known transport metric between them.

Small Data Image Classification

Paper
Add Code

Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning

no code implementations • CVPR 2016 • Ziad Al-Halah, Makarand Tapaswi, Rainer Stiefelhagen

In this work, we aim to carry out attribute-based zero-shot classification in an unsupervised manner.

Attribute Zero-Shot Learning

Paper
Add Code

MovieQA: Understanding Stories in Movies through Question-Answering

1 code implementation • CVPR 2016 • Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, Sanja Fidler

We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text.

Question Answering

Paper
Code

Book2Movie: Aligning Video Scenes With Book Chapters

no code implementations • CVPR 2015 • Makarand Tapaswi, Martin Bauml, Rainer Stiefelhagen

Such an alignment facilitates finding differences between the adaptation and the original source, and also acts as a basis for deriving rich descriptions from the novel for the video clips.

Video Alignment

Paper
Add Code

StoryGraphs: Visualizing Character Interactions as a Timeline

1 code implementation • CVPR 2014 • Makarand Tapaswi, Martin Bauml, Rainer Stiefelhagen

We present a novel way to automatically summarize and represent the storyline of a TV episode by visualizing character interactions as a chart.

Person Identification

Paper
Code

Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

no code implementations • CVPR 2013 • Martin Bauml, Makarand Tapaswi, Rainer Stiefelhagen

We address the problem of person identification in TV series.

Face Recognition General Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.