Search Results for author: Shao-Yen Tseng

Found 13 papers, 5 papers with code

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

no code implementations • 3 Apr 2024 • Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal

In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.

Language Modelling

Paper
Add Code

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

no code implementations • 29 Mar 2024 • Musashi Hinck, Matthew L. Olson, David Cobbley, Shao-Yen Tseng, Vasudev Lal

We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs).

Language Modelling

Paper
Add Code

LDM3D-VR: Latent Diffusion Model for 3D VR

no code implementations • 6 Nov 2023 • Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions.

Paper
Add Code

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

1 code implementation • 31 May 2023 • Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79. 15% accuracy on VQAv2 Test-Std, 86. 56% IR@1 and 95. 64% TR@1 on Flickr30K.

Representation Learning

Paper
Code

LDM3D: Latent Diffusion Model for 3D

2 code implementations • 18 May 2023 • Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal

This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.

4,142

Paper
Code

MuMUR : Multilingual Multimodal Universal Retrieval

no code implementations • 24 Aug 2022 • Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval.

Image Retrieval Machine Translation +3

Paper
Add Code

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

1 code implementation • CVPR 2022 • Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal

Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems.

Question Answering Visual Commonsense Reasoning +1

Paper
Code

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

no code implementations • 8 Feb 2022 • Vin Sachidananda, Shao-Yen Tseng, Erik Marchi, Sachin Kajarekar, Panayiotis Georgiou

By aligning audio representations to pretrained language representations and utilizing contrastive information between acoustic inputs, CALM is able to bootstrap audio embedding competitive with existing audio representation models in only a few hours of training time.

Emotion Recognition Natural Language Understanding

Paper
Add Code

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

1 code implementation • Findings (NAACL) 2022 • Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan

Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.

Knowledge Distillation Object +1

Paper
Code

Multimodal Embeddings from Language Models

1 code implementation • 10 Sep 2019 • Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan

Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement in state of the art across many natural language tasks.

Emotion Recognition Language Modelling +1

Paper
Code

Behavior Gated Language Models

no code implementations • 31 Aug 2019 • Prashanth Gurunath Shivakumar, Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan

In this work we derive motivation from psycholinguistics and propose the addition of behavioral information into the context of language modeling.

Language Modelling

Paper
Add Code

Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions using Speech and Language

no code implementations • 2 Aug 2019 • Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis Georgiou

Cancer impacts the quality of life of those diagnosed as well as their spouse caregivers, in addition to potentially influencing their day-to-day behaviors.

Paper
Add Code

Unsupervised Online Multitask Learning of Behavioral Sentence Embeddings

no code implementations • 18 Jul 2018 • Shao-Yen Tseng, Brian Baucom, Panayiotis Georgiou

Unsupervised learning has been an attractive method for easily deriving meaningful data representations from vast amounts of unlabeled data.

Domain Adaptation Sentence +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.