no code implementations • 3 Apr 2024 • Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell
Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene.
1 code implementation • 10 Jan 2024 • Kevin Cai, Chonghua Liu, David M. Chan
The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18. 8% are English speakers, and just 5. 1% consider it their native language, leading to disparities in online information access.
1 code implementation • 4 Jan 2024 • David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister
We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19. 2%.
no code implementations • 22 Dec 2023 • Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu
In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 2 Feb 2023 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny
If you ask a human to describe an image, they might do so in a thousand different ways.
no code implementations • 6 Jan 2023 • David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister
Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).
1 code implementation • 16 Jun 2022 • Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik
Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence.
no code implementations • 19 May 2022 • David M. Chan, Shalini Ghosh
Deep neural networks have largely demonstrated their ability to perform automated speech recognition (ASR) by extracting meaningful features from input audio frames.
1 code implementation • 12 May 2022 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny
While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.
no code implementations • 12 Oct 2021 • David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister
Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.
1 code implementation • 2 Apr 2021 • Aatif Jiwani, Shubhrakanti Ganguly, Chao Ding, Nan Zhou, David M. Chan
Urban areas consume over two-thirds of the world's energy and account for more than 70 percent of global CO2 emissions.
no code implementations • 27 Jul 2020 • David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny
Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.
1 code implementation • 6 May 2020 • Eliza Kosoy, Jasmine Collins, David M. Chan, Sandy Huang, Deepak Pathak, Pulkit Agrawal, John Canny, Alison Gopnik, Jessica B. Hamrick
Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn.
1 code implementation • 11 Dec 2018 • Biye Jiang, David M. Chan, Tianhao Zhang, John F. Canny
Finally we show that diagnostic visualization using LDAM leads to a novel insight into the parameter averaging method for deep net training.
1 code implementation • 31 Jul 2018 • David M. Chan, Roshan Rao, Forrest Huang, John F. Canny
Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples.