no code implementations • 4 Mar 2024 • Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen, Ishita Dasgupta, Saayan Mitra, Minh Hoai
To generate images with realistic hands, we propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
no code implementations • 4 Dec 2023 • Yizhou Wang, Ruiyi Zhang, Haoliang Wang, Uttaran Bhattacharya, Yun Fu, Gang Wu
Recent advancements in language-model-based video understanding have been progressing at a remarkable pace, spurred by the introduction of Large Language Models (LLMs).
no code implementations • 1 Sep 2023 • Ashmit Khandelwal, Aditya Agrawal, Aanisha Bhattacharyya, Yaman K Singla, Somesh Singh, Uttaran Bhattacharya, Ishita Dasgupta, Stefano Petrangeli, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy
We call these models Large Content and Behavior Models (LCBMs).
no code implementations • 18 Jul 2022 • Uttaran Bhattacharya, Gang Wu, Stefano Petrangeli, Viswanathan Swaminathan, Dinesh Manocha
We propose a method to detect individualized highlights for users on given target videos based on their preferred highlight clips marked on previous videos they have watched.
no code implementations • ICCV 2021 • Uttaran Bhattacharya, Gang Wu, Stefano Petrangeli, Viswanathan Swaminathan, Dinesh Manocha
We train our network to map the activity- and interaction-based latent structural representations of the different modalities to per-frame highlight scores based on the representativeness of the frames.
1 code implementation • 31 Jul 2021 • Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, Dinesh Manocha
Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences.
Ranked #4 on Gesture Generation on TED Gesture Dataset
no code implementations • 18 Sep 2020 • Abhishek Banerjee, Uttaran Bhattacharya, Aniket Bera
Our task is to map gestures to novel emotion categories not encountered in training.
no code implementations • 14 Mar 2020 • Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
Additionally, we extract and compare affective cues corresponding to perceived emotion from the two modalities within a video to infer whether the input video is "real" or "fake".
no code implementations • CVPR 2020 • Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
We report an AP of 65. 83 across 4 categories on GroupWalk, which is also an improvement over prior methods.
Ranked #2 on Emotion Recognition in Context on CAER
Emotion Recognition in Context Multimodal Emotion Recognition
no code implementations • 14 Dec 2019 • Tanmay Randhavane, Uttaran Bhattacharya, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh Manocha
We present a data-driven deep neural algorithm for detecting deceptive walking behavior using nonverbal cues like gaits and gestures.
no code implementations • arXiv 2019 • Rohan Chandra, Tianrui Guan, Srujan Panuganti, Trisha Mittal, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha
In practice, our approach reduces the average prediction error by more than 54% over prior algorithms and achieves a weighted average accuracy of 91. 2% for behavior prediction.
Ranked #1 on Trajectory Prediction on ApolloScape
Robotics
no code implementations • ECCV 2020 • Uttaran Bhattacharya, Christian Roncal, Trisha Mittal, Rohan Chandra, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh Manocha
For the annotated data, we also train a classifier to map the latent embeddings to emotion labels.
no code implementations • 9 Nov 2019 • Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities.
1 code implementation • 28 Oct 2019 • Uttaran Bhattacharya, Trisha Mittal, Rohan Chandra, Tanmay Randhavane, Aniket Bera, Dinesh Manocha
We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE).
1 code implementation • 20 Jul 2019 • Rohan Chandra, Uttaran Bhattacharya, Christian Roncal, Aniket Bera, Dinesh Manocha
RobustTP is an approach that first computes trajectories using a combination of a non-linear motion model and a deep learning-based instance segmentation algorithm.
Robotics
1 code implementation • 25 Jun 2019 • Rohan Chandra, Uttaran Bhattacharya, Tanmay Randhavane, Aniket Bera, Dinesh Manocha
We present a realtime tracking algorithm, RoadTrack, to track heterogeneous road-agents in dense traffic videos.
Robotics
no code implementations • 14 Jun 2019 • Tanmay Randhavane, Uttaran Bhattacharya, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh Manocha
We also present an EWalk (Emotion Walk) dataset that consists of videos of walking individuals with gaits and labeled emotions.
no code implementations • ICCV 2019 • Uttaran Bhattacharya, Venu Madhav Govindu
Our approach significantly outperforms the state-of-the-art robust 3D registration method based on a line process in terms of both speed and accuracy.
2 code implementations • CVPR 2019 • Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha
We evaluate the performance of our prediction algorithm, TraPHic, on the standard datasets and also introduce a new dense, heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories.
Ranked #1 on Trajectory Prediction on TRAF
Trajectory Prediction Robotics