1 code implementation • 18 Aug 2023 • Miguel Sarabia, Elena Menyaylenko, Alessandro Toso, Skyler Seto, Zakaria Aldeneh, Shadi Pirhosseinloo, Luca Zappella, Barry-John Theobald, Nicholas Apostoloff, Jonathan Sheaffer
We present Spatial LibriSpeech, a spatial audio dataset with over 650 hours of 19-channel audio, first-order ambisonics, and optional distractor noise.
no code implementations • 18 Mar 2022 • Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald
Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality.
no code implementations • 18 Feb 2022 • Andrew Silva, Katherine Metcalf, Nicholas Apostoloff, Barry-John Theobald
Federated learning enables the deployment of machine learning to problems for which centralized data collection is impractical.
no code implementations • 8 Feb 2022 • Aparna R. Joshi, Xavier Suau, Nivedha Sivakumar, Luca Zappella, Nicholas Apostoloff
One such high impact domain is that of face recognition, with real world applications involving images affected by various degradations, such as motion blur or high exposure.
no code implementations • 3 Feb 2022 • Bobby Yan, Skyler Seto, Nicholas Apostoloff
Machine learning models are trained to minimize the mean loss for a single metric, and thus typically do not consider fairness and robustness.
no code implementations • NeurIPS Workshop ICBINB 2021 • Arno Blaas, Xavier Suau, Jason Ramapuram, Nicholas Apostoloff, Luca Zappella
Image augmentations applied during training are crucial for the generalization performance of image classifiers.
1 code implementation • 30 Sep 2021 • Xavier Suau, Luca Zappella, Nicholas Apostoloff
We compare our method with FUDGE and PPLM-BoW, and show that our approach is able to achieve gender parity at a lower perplexity.
no code implementations • 12 Feb 2021 • Andrew Silva, Barry-John Theobald, Nicholas Apostoloff
Automatic speech recognition (ASR) is widely used in consumer electronics.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 Dec 2020 • Nataniel Ruiz, Barry-John Theobald, Anurag Ranjan, Ahmed Hussein Abdelaziz, Nicholas Apostoloff
Images generated using MorphGAN conserve the identity of the person in the original image, and the provided control over head pose and facial expression allows test sets to be created to identify robustness issues of a facial recognition deep network with respect to pose and expression.
no code implementations • 27 May 2020 • Ahmed Hussen Abdelaziz, Barry-John Theobald, Paul Dixon, Reinhard Knothe, Nicholas Apostoloff, Sachin Kajareker
We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent video-only approach, and 2) the improvement in the animation of speech-related facial movements after introducing modality dropout.
no code implementations • 15 May 2020 • Xavier Suau, Luca Zappella, Nicholas Apostoloff
We show that expert units are important in several ways: (1) The presence of expert units is correlated ($r^2=0. 833$) with the generalization power of TM, which allows ranking TM without requiring fine-tuning on suites of downstream tasks.
no code implementations • 15 May 2019 • Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nicholas Apostoloff, Thibaut Weise, Sachin Kajareker
We conclude that visual speech synthesis can significantly benefit from the powerful representation of speech in the ASR acoustic models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • ICLR 2019 • Xavier Suau, Luca Zappella, Nicholas Apostoloff
Principal Filter Analysis (PFA) is an easy to implement, yet effective method for neural network compression.
no code implementations • 2 Apr 2019 • Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff
We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user.
no code implementations • 10 Dec 2018 • Katherine Metcalf, Barry-John Theobald, Nicholas Apostoloff
We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents.
no code implementations • ICLR 2019 • Xavier Suau, Luca Zappella, Nicholas Apostoloff
We propose two algorithms: the first allows users to target compression to specific network property, such as number of trainable variable (footprint), and produces a compressed model that satisfies the requested property while preserving the maximum amount of spectral energy in the responses of each layer, while the second is a parameter-free heuristic that selects the compression used at each layer by trying to mimic an ideal set of uncorrelated responses.