no code implementations • 26 Apr 2024 • Abhishek Kumar Singh, Ioannis Patras
The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI.
2 code implementations • 10 Apr 2024 • Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos
In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion relative to the visual context.
Ranked #1 on Emotion Recognition in Context on EMOTIC
no code implementations • 25 Mar 2024 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
To this end, in this paper we present DiffusionAct, a novel method that leverages the photo-realistic image generation of diffusion models to perform neural face reenactment.
3 code implementations • 13 Mar 2024 • Zhonglin Sun, Chen Feng, Ioannis Patras, Georgios Tzimiropoulos
This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition.
1 code implementation • 11 Mar 2024 • Omnia Alwazzan, Abbas Khan, Ioannis Patras, Gregory Slabaugh
We propose a novel Multi-modal Outer Arithmetic Block (MOAB) based on arithmetic operations to combine latent representations of the different modalities for predicting the tumor grade (Grade \rom{2}, \rom{3} and \rom{4}).
1 code implementation • 10 Mar 2024 • Omnia Alwazzan, Ioannis Patras, Gregory Slabaugh
Fusion of multimodal healthcare data holds great promise to provide a holistic view of a patient's health, taking advantage of the complementarity of different modalities while leveraging their correlation.
no code implementations • 4 Mar 2024 • Zheng Gao, Ioannis Patras
Recent efforts toward this goal are limited to treating each face image as a whole, i. e., learning consistent facial representations at the image-level, which overlooks the consistency of local facial representations (i. e., facial regions like eyes, nose, etc).
1 code implementation • 19 Feb 2024 • James Oldfield, Markos Georgopoulos, Grigorios G. Chrysos, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Jiankang Deng, Ioannis Patras
The Mixture of Experts (MoE) paradigm provides a powerful way to decompose inscrutable dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability.
no code implementations • 5 Feb 2024 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
Moreover, we show that by embedding real images in the GAN latent space, our method can be successfully used for the reenactment of real-world faces.
1 code implementation • 2 Nov 2023 • Moreno D'Incà, Christos Tzelepis, Ioannis Patras, Nicu Sebe
These paths are then applied to augment images to improve the fairness of a given dataset.
1 code implementation • 25 Oct 2023 • Niki Maria Foteinopoulou, Ioannis Patras
To test this, we evaluate using zero-shot classification of the model trained on sample-level descriptions on four popular dynamic FER datasets.
Ranked #1 on Zero-Shot Facial Expression Recognition on MAFW
no code implementations • 20 Oct 2023 • Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA).
Ranked #5 on Visual Question Answering (VQA) on A-OKVQA (DA VQA Score metric)
1 code implementation • 25 Aug 2023 • Zengqun Zhao, Ioannis Patras
For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extracting temporal facial expression features, and the final feature embedding is obtained as a learnable "class" token.
Dynamic Facial Expression Recognition Facial Expression Recognition +1
no code implementations • 25 Aug 2023 • Zheng Gao, Chen Feng, Ioannis Patras
Inspired by cross-modality learning, we extend this existing framework that only learns from global features by encouraging the global features and intermediate layer features to learn from each other.
no code implementations • 28 Jul 2023 • Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos
DETR-based object detectors have achieved remarkable performance but are sample-inefficient and exhibit slow convergence.
1 code implementation • ICCV 2023 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose.
2 code implementations • 23 May 2023 • James Oldfield, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras
Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks.
1 code implementation • 6 Apr 2023 • Giorgos Kordopatis-Zilos, Giorgos Tolias, Christos Tzelepis, Ioannis Kompatsiaris, Ioannis Patras, Symeon Papadopoulos
We introduce S$^2$VS, a video similarity learning approach with self-supervision.
Ranked #1 on Video Retrieval on FIVR-200K
1 code implementation • CVPR 2023 • Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, Ioannis Patras
Clustering has been a major research topic in the field of machine learning, one to which Deep Learning has recently been applied with significant success.
1 code implementation • CVPR 2023 • Chen Feng, Ioannis Patras
More specifically, within the contrastive learning framework, for each sample our method generates soft-labels with the aid of coarse labels against other samples and another augmented view of the sample in question.
Ranked #1 on Learning with coarse labels on cifar100
1 code implementation • CVPR 2023 • Simone Barattin, Christos Tzelepis, Ioannis Patras, Nicu Sebe
By optimizing the latent codes directly, we ensure both that the identity is of a desired distance away from the original (with an identity obfuscation loss), whilst preserving the facial attributes (using a novel feature-matching loss in FaRL's deep feature space).
1 code implementation • 21 Nov 2022 • Georgios Zoumpourlis, Ioannis Patras
The first loss applies curriculum learning, forcing each feature extractor to specialize to a subset of the training subjects and promoting feature diversity.
1 code implementation • 27 Sep 2022 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
In this paper we address the problem of neural face reenactment, where, given a pair of a source and a target facial image, we need to transfer the target's pose (defined as the head pose and its facial expressions) to the source image, by preserving at the same time the source's identity characteristics (e. g., facial shape, hair style, etc), even in the challenging case where the source and the target faces belong to different identities.
1 code implementation • 22 Sep 2022 • Harsh Panwar, Ioannis Patras
Capsule Networks have shown tremendous advancement in the past decade, outperforming the traditional CNNs in various task due to it's equivariant properties.
1 code implementation • 22 Jul 2022 • Chen Feng, Ioannis Patras
Self-supervised learning has recently achieved great success in representation learning without human annotations.
1 code implementation • 12 Jul 2022 • Niki Maria Foteinopoulou, Ioannis Patras
In the case of affect recognition, we outperform previous vision-based methods in terms of CCC on both the OMG and the AMIGOS datasets.
Ranked #1 on Continuous Affect Estimation on AMIGOS
1 code implementation • ACM ICMR 2022 • Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris, Ioannis Patras
Instead of simply modeling the frames' dependencies based on global attention, our method integrates a concentrated attention mechanism that is able to focus on non-overlapping blocks in the main diagonal of the attention matrix, and to enrich the existing information by extracting and exploiting knowledge about the uniqueness and diversity of the associated frames of the video.
Ranked #1 on Unsupervised Video Summarization on TvSum
1 code implementation • 5 Jun 2022 • Christos Tzelepis, James Oldfield, Georgios Tzimiropoulos, Ioannis Patras
This work addresses the problem of discovering non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner.
1 code implementation • 31 May 2022 • James Oldfield, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras
Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs.
1 code implementation • IEEE International Symposium on Multimedia (ISM) 2021 • Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris, Ioannis Patras
This paper presents a new method for supervised video summarization.
Ranked #1 on Video Summarization on SumMe
no code implementations • 23 Nov 2021 • James Oldfield, Markos Georgopoulos, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras
This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) to facilitate controllable image synthesis.
1 code implementation • 22 Nov 2021 • Chen Feng, Georgios Tzimiropoulos, Ioannis Patras
Under this setting, unlike previous methods that often introduce multiple assumptions and lead to complex solutions, we propose a simple, efficient and robust framework named Sample Selection and Relabelling(SSR), that with a minimal number of hyperparameters achieves SOTA results in various conditions.
Ranked #1 on Image Classification on CIFAR-10 (with noisy labels)
1 code implementation • ICCV 2021 • Christos Tzelepis, Georgios Tzimiropoulos, Ioannis Patras
This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors.
1 code implementation • 24 Jun 2021 • Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos, Ioannis Kompatsiaris, Ioannis Patras
In this work, we propose a Knowledge Distillation framework, called Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selector Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency.
Ranked #2 on Video Retrieval on FIVR-200K
1 code implementation • 8 Jun 2021 • Ting-Ting Xie, Christos Tzelepis, Fan Fu, Ioannis Patras
Learning to localize actions in long, cluttered, and untrimmed videos is a hard task, that in the literature has typically been addressed assuming the availability of large amounts of annotated training samples for each class -- either in a fully-supervised setting, where action boundaries are known, or in a weakly-supervised setting, where only class labels are known for each video.
no code implementations • 8 Mar 2021 • Fan Fu, TingTing Xie, Ioannis Patras, Sepehr Jalali
Understanding interactions between objects in an image is an important element for generating captions.
2 code implementations • 11 Feb 2021 • Christos Tzelepis, Ioannis Patras
In this technical report we study the problem of propagation of uncertainty (in terms of variances of given uni-variate normal random variables) through typical building blocks of a Convolutional Neural Network (CNN).
no code implementations • 15 Jan 2021 • Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2020 • Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras
This paper presents a new method for unsupervised video summarization.
Ranked #3 on Unsupervised Video Summarization on TvSum
Generative Adversarial Network Unsupervised Video Summarization
no code implementations • 25 Aug 2020 • Ting-Ting Xie, Christos Tzelepis, Ioannis Patras
Results in the action localization problem show that the incorporation of second order statistics improves over the baseline network, and that VANp surpasses the accuracy of virtually all other two-stage networks without involving any additional parameters.
no code implementations • 25 Aug 2020 • Ting-Ting Xie, Christos Tzelepis, Ioannis Patras
We use two uncertainty-aware boundary regression losses: first, the Kullback-Leibler divergence between the ground truth location of the boundary and the Gaussian modeling the prediction of the boundary and second, the expectation of the $\ell_1$ loss under the same Gaussian.
1 code implementation • MultiMedia Modeling (MMM) 2019 • Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras
Experimental evaluation on two datasets (SumMe and TVSum) documents the contribution of the attention auto-encoder to faster and more stable training of the model, resulting in a significant performance improvement with respect to the original model and demonstrating the competitiveness of the proposed SUM-GAN-AAE against the state of the art.
Ranked #6 on Unsupervised Video Summarization on SumMe
1 code implementation • AI4TV 2019 • Evlampios Apostolidis, Alexandros I. Metsai, Eleni Adamantidou, Vasileios Mezaris, Ioannis Patras
In this paper we present our work on improving the efficiency of adversarial training for unsupervised video summarization.
Ranked #5 on Unsupervised Video Summarization on TvSum
1 code implementation • ICCV 2019 • Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
Subsequently, the similarity matrix between all video frames is fed to a four-layer CNN, and then summarized using Chamfer Similarity (CS) into a video-to-video similarity score -- this avoids feature aggregation before the similarity calculation between videos and captures the temporal similarity patterns between matching frame sequences.
Ranked #5 on Video Retrieval on FIVR-200K
no code implementations • 21 Jul 2019 • Mina Bishay, Georgios Zoumpourlis, Ioannis Patras
At the heart of our network is a meta-learning approach that learns to compare representations of variable temporal length, that is, either two videos of different length (in the case of few-shot action recognition) or a video and a semantic representation such as word vector (in the case of zero-shot action recognition).
Ranked #7 on Few Shot Action Recognition on Kinetics-100
no code implementations • 25 May 2019 • Ting-Ting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras
Temporal action localization has recently attracted significant interest in the Computer Vision community.
no code implementations • 11 Feb 2019 • Youngkyoon Jang, Hatice Gunes, Ioannis Patras
In this paper, we present a novel single shot face-related task analysis method, called Face-SSD, for detecting faces and for performing various face-related (classification/regression) tasks including smile recognition, face attribute prediction and valence-arousal estimation in the wild.
1 code implementation • 11 Sep 2018 • Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
To create the dataset, we devise a process for the collection of YouTube videos based on major news events from recent years crawled from Wikipedia and deploy a retrieval pipeline for the automatic selection of query videos based on their estimated suitability as benchmarks.
no code implementations • 7 Aug 2018 • Mina Bishay, Petar Palasek, Stefan Priebe, Ioannis Patras
Patients with schizophrenia often display impairments in the expression of emotion and speech and those are observed in their facial behaviour.
no code implementations • 13 Jan 2018 • Petar Palasek, Ioannis Patras
In this work we explore how the architecture proposed in [8], which expresses the processing steps of the classical Fisher vector pipeline approaches, i. e. dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction as network layers, can be modified into a hybrid network that combines the benefits of both unsupervised and supervised training methods, resulting in a model that learns a semi-supervised Fisher vector descriptor of the input data.
no code implementations • ICCV 2017 • Ioannis Marras, Petar Palasek, Ioannis Patras
We overcome this by introducing a Markov Random Field (MRF)-based spatial model network between the coarse and the refinement model that introduces geometric constraints on the relative locations of the body joints.
no code implementations • 19 Jul 2017 • Petar Palasek, Ioannis Patras
In this work we propose a novel neural network architecture for the problem of human action recognition in videos.
no code implementations • 22 Jan 2016 • Aria Ahmadi, Ioannis Patras
In this paper, we propose a direct method and train a Convolutional Neural Network (CNN) that when, at test time, is given a pair of images as input it produces a dense motion field F at its output layer.
no code implementations • 25 Nov 2015 • Christos Tzelepis, Damianos Galanopoulos, Vasileios Mezaris, Ioannis Patras
In this work we deal with the problem of high-level event detection in video.
1 code implementation • 11 Jul 2015 • Heng Yang, Wenxuan Mou, Yichi Zhang, Ioannis Patras, Hatice Gunes, Peter Robinson
In this paper we propose a supervised initialization scheme for cascaded face alignment based on explicit head pose estimation.
1 code implementation • 15 Apr 2015 • Christos Tzelepis, Vasileios Mezaris, Ioannis Patras
In this paper, we propose a maximum margin classifier that deals with uncertainty in data input.
no code implementations • CVPR 2015 • Heng Yang, Ioannis Patras
Our experiments lead to several interesting findings: 1) Surprisingly, most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0. 7).