Search Results for author: Hritik Bansal

Found 23 papers, 16 papers with code

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

no code implementations • 7 May 2024 • Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang

For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e. g., `a red panda climbing a tree') and second scene description (e. g., `the red panda sleeps on the top of the tree'), respectively.

Paper
Add Code

GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling

no code implementations • 7 Apr 2024 • Hritik Bansal, Po-Nien Kung, P. Jeffrey Brantingham, Kai-Wei Chang, Nanyun Peng

In this paper, we propose GenEARL, a training-free generative framework that harness the power of the modern generative models to understand event task descriptions given image contexts to perform the EARL task.

Language Modelling Large Language Model +1

Paper
Add Code

Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

no code implementations • 1 Apr 2024 • Yixin Wan, Arjun Subramonian, Anaelia Ovalle, Zongyu Lin, Ashima Suvarna, Christina Chance, Hritik Bansal, Rebecca Pattichis, Kai-Wei Chang

In this survey, we review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture.

Text-to-Image Generation

Paper
Add Code

Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

1 code implementation • 31 Mar 2024 • Hritik Bansal, Ashima Suvarna, Gantavya Bhatt, Nanyun Peng, Kai-Wei Chang, Aditya Grover

A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context.

Paper
Code

Improving Event Definition Following For Zero-Shot Event Detection

no code implementations • 5 Mar 2024 • Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng

We hypothesize that a diverse set of event types and definitions are the key for models to learn to follow event definitions while existing event extraction datasets focus on annotating many high-quality examples for a few event types.

Event Detection Event Extraction

Paper
Add Code

ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

no code implementations • 24 Jan 2024 • Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang, Nanyun Peng

Our findings reveal a significant performance gap of 30. 8% between the best-performing LMM, GPT-4V(ision), and human capabilities using human evaluation indicating substantial room for improvement in context-sensitive text-rich visual reasoning.

Visual Reasoning

Paper
Add Code

Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

no code implementations • 6 Dec 2023 • Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Sandeep Madireddy, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Aditya Grover

At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals.

Weather Forecasting

Paper
Add Code

VideoCon: Robust Video-Language Alignment via Contrast Captions

1 code implementation • 15 Nov 2023 • Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover

Despite being (pre)trained on a massive amount of data, state-of-the-art video-language alignment models are not robust to semantically-plausible contrastive changes in the video captions.

Language Modelling Large Language Model +5

Paper
Code

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

1 code implementation • 3 Oct 2023 • Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao

To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks.

Chatbot Image Captioning +5

177

Paper
Code

Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models

1 code implementation • 30 Aug 2023 • Hritik Bansal, John Dang, Aditya Grover

In particular, we find that LLMs that leverage rankings data for alignment (say model X) are preferred over those that leverage ratings data (say model Y), with a rank-based evaluation protocol (is X/Y's response better than reference response?)

Paper
Code

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

1 code implementation • 12 Aug 2023 • Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt

These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.

Instruction Following

Paper
Code

ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling

1 code implementation • NeurIPS 2023 • Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover

Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts.

Benchmarking Weather Forecasting

284

Paper
Code

Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

1 code implementation • 23 May 2023 • Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang

Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses.

Continual Learning

Paper
Code

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

1 code implementation • ICCV 2023 • Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang

Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data.

Backdoor Attack Contrastive Learning +1

Paper
Code

Leaving Reality to Imagination: Robust Classification via Generated Datasets

1 code implementation • 5 Feb 2023 • Hritik Bansal, Aditya Grover

Recent research on robustness has revealed significant performance gaps between neural image classifiers trained on datasets that are similar to the test set, and those that are from a naturally shifted distribution, such as sketches, paintings, and animations of the object categories observed during training.

Classification Robust classification

Paper
Code

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

1 code implementation • 18 Dec 2022 • Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth

Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance.

In-Context Learning Language Modelling +1

Paper
Code

How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?

1 code implementation • 27 Oct 2022 • Hritik Bansal, Da Yin, Masoud Monajatipoor, Kai-Wei Chang

To this end, we introduce an Ethical NaTural Language Interventions in Text-to-Image GENeration (ENTIGEN) benchmark dataset to evaluate the change in image generations conditional on ethical interventions across three social axes -- gender, skin color, and culture.

Cultural Vocal Bursts Intensity Prediction Text-to-Image Generation

Paper
Code

CyCLIP: Cyclic Contrastive Language-Image Pretraining

1 code implementation • 28 May 2022 • Shashank Goel, Hritik Bansal, Sumit Bhatia, Ryan A. Rossi, Vishwa Vinay, Aditya Grover

Recent advances in contrastive representation learning over paired image-text data have led to models such as CLIP that achieve state-of-the-art performance for zero-shot classification and distributional robustness.

Representation Learning Visual Reasoning +1

109

Paper
Code

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

1 code implementation • 24 May 2022 • Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, Kai-Wei Chang

In this paper, we introduce a benchmark dataset, Geo-Diverse Commonsense Multilingual Language Models Analysis (GeoMLAMA), for probing the diversity of the relational knowledge in multilingual PLMs.

Language Modelling

Paper
Code

Systematic Generalization in Neural Networks-based Multivariate Time Series Forecasting Models

1 code implementation • 10 Feb 2021 • Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, Prathosh A. P

Systematic generalization aims to evaluate reasoning about novel combinations from known components, an intrinsic property of human cognition.

Inductive Bias Multivariate Time Series Forecasting +3

Paper
Code

Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?

1 code implementation • SCiL 2021 • Hritik Bansal, Gantavya Bhatt, Sumeet Agarwal

However, we observe that several RNN types, including the ONLSTM which has a soft structural inductive bias, surprisingly fail to perform well on sentences without attractors when trained solely on sentences with attractors.

Inductive Bias

Paper
Code

How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?

1 code implementation • ACL 2020 • Gantavya Bhatt, Hritik Bansal, Rishubh Singh, Sumeet Agarwal

Long short-term memory (LSTM) networks and their variants are capable of encapsulating long-range dependencies, which is evident from their performance on a variety of linguistic tasks.

Ranked #35 on Language Modelling on WikiText-103 (Validation perplexity metric)

Language Modelling Sentence

Paper
Code

An improved sex specific and age dependent classification model for Parkinson's diagnosis using handwriting measurement

no code implementations • 21 Apr 2019 • Ujjwal Gupta, Hritik Bansal, Deepak Joshi

In this paper, we develop a sex-specific and age-dependent classification method to diagnose the Parkinson's disease using the online handwriting recorded from individuals with Parkinson's(n=37;m/f-19/18;age-69. 3+-10. 9years) and healthy controls(n=38;m/f-20/18;age-62. 4+-11. 3 years). The sex specific and age dependent classifier was observed significantly outperforming the generalized classifier.

Classification General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.