1 code implementation • 2 Apr 2024 • Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic
In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data.
1 code implementation • 15 May 2023 • Antoni Bigata Casademunt, Rodrigo Mira, Nikita Drobyshev, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
Speech-driven animation has gained significant traction in recent years, with current methods achieving near-photorealistic results.
1 code implementation • 12 Dec 2022 • Alexandros Haliassos, Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Maja Pantic
We observe strong results in low- and high-resource labelled data settings when fine-tuning the visual and auditory encoders resulting from a single pre-training stage, in which the encoders are jointly trained.
Ranked #1 on Speech Recognition on LRS2 (using extra training data)
no code implementations • 20 Nov 2022 • Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic
Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements.
2 code implementations • 4 May 2022 • Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic
Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.
1 code implementation • CVPR 2022 • Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, Maja Pantic
One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression.
Ranked #2 on DeepFake Detection on FakeAVCeleb
no code implementations • 16 Jun 2021 • Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic
The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.
no code implementations • 27 Apr 2021 • Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic
In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.