Search Results for author: Hritik Bansal

Found 23 papers, 16 papers with code

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

no code implementations7 May 2024 Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang

For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e. g., `a red panda climbing a tree') and second scene description (e. g., `the red panda sleeps on the top of the tree'), respectively.

GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling

no code implementations7 Apr 2024 Hritik Bansal, Po-Nien Kung, P. Jeffrey Brantingham, Kai-Wei Chang, Nanyun Peng

In this paper, we propose GenEARL, a training-free generative framework that harness the power of the modern generative models to understand event task descriptions given image contexts to perform the EARL task.

Language Modelling Large Language Model +1

Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

1 code implementation31 Mar 2024 Hritik Bansal, Ashima Suvarna, Gantavya Bhatt, Nanyun Peng, Kai-Wei Chang, Aditya Grover

A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context.

Improving Event Definition Following For Zero-Shot Event Detection

no code implementations5 Mar 2024 Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng

We hypothesize that a diverse set of event types and definitions are the key for models to learn to follow event definitions while existing event extraction datasets focus on annotating many high-quality examples for a few event types.

Event Detection Event Extraction

ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

no code implementations24 Jan 2024 Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang, Nanyun Peng

Our findings reveal a significant performance gap of 30. 8% between the best-performing LMM, GPT-4V(ision), and human capabilities using human evaluation indicating substantial room for improvement in context-sensitive text-rich visual reasoning.

Visual Reasoning

VideoCon: Robust Video-Language Alignment via Contrast Captions

1 code implementation15 Nov 2023 Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover

Despite being (pre)trained on a massive amount of data, state-of-the-art video-language alignment models are not robust to semantically-plausible contrastive changes in the video captions.

Language Modelling Large Language Model +5

Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models

1 code implementation30 Aug 2023 Hritik Bansal, John Dang, Aditya Grover

In particular, we find that LLMs that leverage rankings data for alignment (say model X) are preferred over those that leverage ratings data (say model Y), with a rank-based evaluation protocol (is X/Y's response better than reference response?)

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

1 code implementation12 Aug 2023 Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt

These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.

Instruction Following

ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling

1 code implementation NeurIPS 2023 Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover

Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts.

Benchmarking Weather Forecasting

Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

1 code implementation23 May 2023 Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang

Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses.

Continual Learning

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

1 code implementation ICCV 2023 Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang

Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data.

Backdoor Attack Contrastive Learning +1

Leaving Reality to Imagination: Robust Classification via Generated Datasets

1 code implementation5 Feb 2023 Hritik Bansal, Aditya Grover

Recent research on robustness has revealed significant performance gaps between neural image classifiers trained on datasets that are similar to the test set, and those that are from a naturally shifted distribution, such as sketches, paintings, and animations of the object categories observed during training.

Classification Robust classification

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

1 code implementation18 Dec 2022 Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth

Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance.

In-Context Learning Language Modelling +1

How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?

1 code implementation27 Oct 2022 Hritik Bansal, Da Yin, Masoud Monajatipoor, Kai-Wei Chang

To this end, we introduce an Ethical NaTural Language Interventions in Text-to-Image GENeration (ENTIGEN) benchmark dataset to evaluate the change in image generations conditional on ethical interventions across three social axes -- gender, skin color, and culture.

Cultural Vocal Bursts Intensity Prediction Text-to-Image Generation

CyCLIP: Cyclic Contrastive Language-Image Pretraining

1 code implementation28 May 2022 Shashank Goel, Hritik Bansal, Sumit Bhatia, Ryan A. Rossi, Vishwa Vinay, Aditya Grover

Recent advances in contrastive representation learning over paired image-text data have led to models such as CLIP that achieve state-of-the-art performance for zero-shot classification and distributional robustness.

Representation Learning Visual Reasoning +1

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

1 code implementation24 May 2022 Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, Kai-Wei Chang

In this paper, we introduce a benchmark dataset, Geo-Diverse Commonsense Multilingual Language Models Analysis (GeoMLAMA), for probing the diversity of the relational knowledge in multilingual PLMs.

Language Modelling

Systematic Generalization in Neural Networks-based Multivariate Time Series Forecasting Models

1 code implementation10 Feb 2021 Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, Prathosh A. P

Systematic generalization aims to evaluate reasoning about novel combinations from known components, an intrinsic property of human cognition.

Inductive Bias Multivariate Time Series Forecasting +3

Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?

1 code implementation SCiL 2021 Hritik Bansal, Gantavya Bhatt, Sumeet Agarwal

However, we observe that several RNN types, including the ONLSTM which has a soft structural inductive bias, surprisingly fail to perform well on sentences without attractors when trained solely on sentences with attractors.

Inductive Bias

How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?

1 code implementation ACL 2020 Gantavya Bhatt, Hritik Bansal, Rishubh Singh, Sumeet Agarwal

Long short-term memory (LSTM) networks and their variants are capable of encapsulating long-range dependencies, which is evident from their performance on a variety of linguistic tasks.

Ranked #35 on Language Modelling on WikiText-103 (Validation perplexity metric)

Language Modelling Sentence

An improved sex specific and age dependent classification model for Parkinson's diagnosis using handwriting measurement

no code implementations21 Apr 2019 Ujjwal Gupta, Hritik Bansal, Deepak Joshi

In this paper, we develop a sex-specific and age-dependent classification method to diagnose the Parkinson's disease using the online handwriting recorded from individuals with Parkinson's(n=37;m/f-19/18;age-69. 3+-10. 9years) and healthy controls(n=38;m/f-20/18;age-62. 4+-11. 3 years). The sex specific and age dependent classifier was observed significantly outperforming the generalized classifier.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.