Search Results for author: David M. Chan

Found 15 papers, 9 papers with code

ALOHa: A New Measure for Hallucination in Captioning Models

no code implementations • 3 Apr 2024 • Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell

Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene.

Hallucination Object +2

Paper
Add Code

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

1 code implementation • 10 Jan 2024 • Kevin Cai, Chonghua Liu, David M. Chan

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18. 8% are English speakers, and just 5. 1% consider it their native language, leading to disparities in online information access.

Video Summarization

Paper
Code

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

1 code implementation • 4 Jan 2024 • David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19. 2%.

Attribute Automatic Speech Recognition +4

Paper
Code

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

no code implementations • 22 Dec 2023 • Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

IC3: Image Captioning by Committee Consensus

1 code implementation • 2 Feb 2023 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

If you ask a human to describe an image, they might do so in a thousand different ways.

Image Captioning

Paper
Code

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

no code implementations • 6 Jan 2023 • David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).

Domain Adaptation speech-recognition +1

Paper
Add Code

Towards Understanding How Machines Can Learn Causal Overhypotheses

1 code implementation • 16 Jun 2022 • Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik

Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence.

BIG-bench Machine Learning Causal Inference

Paper
Code

Content-Context Factorized Representations for Automated Speech Recognition

no code implementations • 19 May 2022 • David M. Chan, Shalini Ghosh

Deep neural networks have largely demonstrated their ability to perform automated speech recognition (ASR) by extracting meaningful features from input audio frames.

speech-recognition Speech Recognition

Paper
Add Code

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

1 code implementation • 12 May 2022 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

Video Description

Paper
Code

Multi-Modal Pre-Training for Automated Speech Recognition

no code implementations • 12 Oct 2021 • David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.

Language Modelling Masked Language Modeling +3

Paper
Add Code

A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery

1 code implementation • 2 Apr 2021 • Aatif Jiwani, Shubhrakanti Ganguly, Chao Ding, Nan Zhou, David M. Chan

Urban areas consume over two-thirds of the world's energy and account for more than 70 percent of global CO2 emissions.

Decision Making Semantic Segmentation

Paper
Code

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

no code implementations • 27 Jul 2020 • David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.

Active Learning Video Captioning +1

Paper
Add Code

Exploring Exploration: Comparing Children with RL Agents in Unified Environments

1 code implementation • 6 May 2020 • Eliza Kosoy, Jasmine Collins, David M. Chan, Sandy Huang, Deepak Pathak, Pulkit Agrawal, John Canny, Alison Gopnik, Jessica B. Hamrick

Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn.

Paper
Code

Diagnostic Visualization for Deep Neural Networks Using Stochastic Gradient Langevin Dynamics

1 code implementation • 11 Dec 2018 • Biye Jiang, David M. Chan, Tianhao Zhang, John F. Canny

Finally we show that diagnostic visualization using LDAM leads to a novel insight into the parameter averaging method for deep net training.

913

Paper
Code

t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data

1 code implementation • 31 Jul 2018 • David M. Chan, Roshan Rao, Forrest Huang, John F. Canny

Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples.

Dimensionality Reduction

1,729

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.