Search Results for author: Linchao Zhu

Found 77 papers, 36 papers with code

AudioScenic: Audio-Driven Video Scene Editing

no code implementations • 25 Apr 2024 • Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

AudioScenic exploits the inherent properties of audio, namely, audio magnitude and frequency, to guide the editing process, aiming to control the temporal dynamics and enhance the temporal consistency.

Paper
Add Code

Neural Interaction Energy for Multi-Agent Trajectory Prediction

no code implementations • 25 Apr 2024 • Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE).

Trajectory Prediction

Paper
Add Code

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

no code implementations • 22 Apr 2024 • Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Hao Jiang, Fei Wu, Linchao Zhu

The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs.

Attribute Hallucination +1

Paper
Add Code

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

no code implementations • 24 Mar 2024 • Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang

We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.

Attribute Video Editing

Paper
Add Code

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

no code implementations • 24 Mar 2024 • Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang

The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space.

Attribute Image Retrieval +2

Paper
Add Code

Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models

no code implementations • 23 Mar 2024 • Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang

These concealed passphrases in user documents, referred to as \textit{ghost sentences}, once they are identified in the generated content of LLMs, users can be sure that their data is used for training.

Sentence

Paper
Add Code

CapHuman: Capture Your Moments in Parallel Universes

1 code implementation • 1 Feb 2024 • Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.

Image Generation

Paper
Code

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

1 code implementation • 19 Jan 2024 • Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.

Retrieval Video Retrieval

Paper
Code

AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

no code implementations • 12 Jan 2024 • Yuanzhi Liang, Linchao Zhu, Yi Yang

To address this challenge, we introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.

Informativeness

Paper
Add Code

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

no code implementations • 27 Nov 2023 • Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang

Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.

Video Generation

Paper
Add Code

Text Augmented Spatial-aware Zero-shot Referring Image Segmentation

no code implementations • 27 Oct 2023 • Yucheng Suo, Linchao Zhu, Yi Yang

This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations.

Image Segmentation Referring Expression +4

Paper
Add Code

Combating Label Noise With A General Surrogate Model For Sample Selection

no code implementations • 16 Oct 2023 • Chao Liang, Linchao Zhu, Humphrey Shi, Yi Yang

Sample selection is an effective way to deal with label noise.

Memorization Selection bias

Paper
Add Code

IcoCap: Improving Video Captioning by Compounding Images

no code implementations • IEEE Transactions on Multimedia 2023 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.

Ranked #5 on Video Captioning on VATEX (using extra training data)

Image Captioning Video Captioning

Paper
Add Code

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

1 code implementation • 4 Sep 2023 • Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity. Despite the recent significant process in text-based human motion generation, existing methods often prioritize fitting training motions at the expense of action diversity.

Ranked #3 on Motion Synthesis on HumanML3D (using extra training data)

Language Modelling Motion Synthesis

Paper
Code

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

1 code implementation • 24 Jul 2023 • Yuanzhi Liang, Linchao Zhu, Yi Yang

MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions.

Paper
Code

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation • 3 Jul 2023 • Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Paper
Code

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

1 code implementation • 29 May 2023 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Given a single test sample, the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.

Image Captioning Image Classification +5

Paper
Code

Whitening-based Contrastive Learning of Sentence Embeddings

1 code implementation • 28 May 2023 • Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment.

Contrastive Learning Semantic Textual Similarity +4

Paper
Code

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

1 code implementation • 23 May 2023 • Shuai Zhao, Ruijie Quan, Linchao Zhu, Yi Yang

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

Ranked #1 on Scene Text Recognition on Uber-Text

Decoder Language Modelling +1

Paper
Code

Gloss-Free End-to-End Sign Language Translation

1 code implementation • 22 May 2023 • Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations.

Sign Language Translation Translation

Paper
Code

Efficient Multimodal Fusion via Interactive Prompting

no code implementations • CVPR 2023 • Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang

Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.

Paper
Add Code

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

1 code implementation • 6 Mar 2023 • Wei Li, Linchao Zhu, Longyin Wen, Yi Yang

This decoder is both data-efficient and computation-efficient: 1) it only requires the text data for training, easing the burden on the collection of paired data.

Decoder Image Captioning +1

109

Paper
Code

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations • 22 Jan 2023 • Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Paper
Add Code

Temporal Perceiving Video-Language Pre-training

no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang

Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.

Contrastive Learning Moment Retrieval +7

Paper
Add Code

Discriminative Radial Domain Adaptation

1 code implementation • 1 Jan 2023 • Zenan Huang, Jun Wen, Siheng Chen, Linchao Zhu, Nenggan Zheng

Domain adaptation methods reduce domain shift typically by learning domain-invariant features.

Domain Generalization Unsupervised Domain Adaptation

Paper
Code

MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects

1 code implementation • ICCV 2023 • Yuanzhi Liang, Xiaohan Wang, Linchao Zhu, Yi Yang

Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.

Object

Paper
Code

PointListNet: Deep Learning on 3D Point Lists

no code implementations • CVPR 2023 • Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli

Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.

Paper
Add Code

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

1 code implementation • CVPR 2023 • Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must.

Ranked #2 on Video Question Answering on AGQA 2.0 balanced

Question Answering Video Question Answering +2

Paper
Code

Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification

1 code implementation • IEEE Transactions on Neural Networks and Learning Systems 2022 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods.

Ranked #28 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Fine-Grained Visual Recognition

Paper
Code

Slimmable Networks for Contrastive Self-supervised Learning

no code implementations • 30 Sep 2022 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

In this work, we present a one-stage solution to obtain pre-trained small models without the need for extra teachers, namely, slimmable networks for contrastive self-supervised learning (\emph{SlimCLR}).

Contrastive Learning Knowledge Distillation +2

Paper
Add Code

AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement

no code implementations • 6 Aug 2022 • Shannan Guan, Haiyan Lu, Linchao Zhu, Gengfa Fang

Existing 3D skeleton-based action recognition approaches reach impressive performance by encoding handcrafted action features to image format and decoding by CNNs.

Action Recognition Skeleton Based Action Recognition

Paper
Add Code

Fine-Grained Semantically Aligned Vision-Language Pre-Training

1 code implementation • 4 Aug 2022 • Juncheng Li, Xin He, Longhui Wei, Long Qian, Linchao Zhu, Lingxi Xie, Yueting Zhuang, Qi Tian, Siliang Tang

Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks.

object-detection Object Detection +1

Paper
Code

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

1 code implementation • 3 Aug 2022 • Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.

Emotion Classification Temporal Action Localization +1

Paper
Code

PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning

no code implementations • 7 Jul 2022 • Shannan Guan, Haiyan Lu, Linchao Zhu, Gengfa Fang

3D pose estimation has recently gained substantial interests in computer vision domain.

Ranked #35 on 3D Human Pose Estimation on MPI-INF-3DHP

3D Human Pose Estimation 3D Pose Estimation +1

Paper
Add Code

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

1 code implementation • 2 May 2022 • Shuai Zhao, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.

Ranked #11 on Video Retrieval on MSVD (using extra training data)

Clustering Retrieval +1

119

Paper
Code

Unified Transformer Tracker for Object Tracking

1 code implementation • CVPR 2022 • Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan

Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.

Multiple Object Tracking Object

Paper
Code

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation • CVPR 2022 • Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Paper
Code

Complex Video Action Reasoning via Learnable Markov Logic Network

no code implementations • CVPR 2022 • Yang Jin, Linchao Zhu, Yadong Mu

The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos.

Action Recognition Human-Object Interaction Detection +1

Paper
Add Code

SEEG: Semantic Energized Co-Speech Gesture Generation

1 code implementation • CVPR 2022 • Yuanzhi Liang, Qianyu Feng, Linchao Zhu, Li Hu, Pan Pan, Yi Yang

Talking gesture generation is a practical yet challenging task which aims to synthesize gestures in line with speech.

Ranked #6 on Gesture Generation on TED Gesture Dataset

Gesture Generation

Paper
Code

A Simple Episodic Linear Probe Improves Visual Recognition in the Wild

2 code implementations • CVPR 2022 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, we propose an episodic linear probing (ELP) classifier to reflect the generalization of visual representations in an online manner.

Ranked #13 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Long-tail Learning +1

581

Paper
Code

Vector-Decomposed Disentanglement for Domain-Invariant Object Detection

1 code implementation • ICCV 2021 • Aming Wu, Rui Liu, Yahong Han, Linchao Zhu, Yi Yang

Secondly, domain-specific representations are introduced as the differences between the input and domain-invariant representations.

Disentanglement Object +2

Paper
Code

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

no code implementations • ICCV 2021 • Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang, Fei Wu, Yi Yang, Yueting Zhuang

Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies.

Paper
Add Code

Less is More: Sparse Sampling for Dense Reaction Predictions

no code implementations • 3 Jun 2021 • Kezhou Lin, Xiaohan Wang, Zhedong Zheng, Linchao Zhu, Yi Yang

Obtaining viewer responses from videos can be useful for creators and streaming platforms to analyze the video performance and improve the future user experience.

Paper
Add Code

OR-Net: Pointwise Relational Inference for Data Completion under Partial Observation

no code implementations • 2 May 2021 • Qianyu Feng, Linchao Zhu, Bang Zhang, Pan Pan, Yi Yang

Specifically, we expect to approximate the real joint distribution over the partial observation and latent variables, thus infer the unseen targets respectively.

Paper
Add Code

Faster Meta Update Strategy for Noise-Robust Deep Learning

1 code implementation • 30 Apr 2021 • Youjiang Xu, Linchao Zhu, Lu Jiang, Yi Yang

It has been shown that deep neural networks are prone to overfitting on biased training data.

Ranked #1 on Image Classification on CIFAR-10, 40% Symmetric Noise

Learning with noisy labels Meta-Learning

Paper
Code

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

1 code implementation • CVPR 2021 • Xiaohan Wang, Linchao Zhu, Yi Yang

Moreover, a global alignment method is proposed to provide a global cross-modal measurement that is complementary to the local perspective.

Retrieval Video Retrieval

1,426

Paper
Code

Universal-Prototype Enhancing for Few-Shot Object Detection

1 code implementation • ICCV 2021 • Aming Wu, Yahong Han, Linchao Zhu, Yi Yang

Thus, we develop a new framework of few-shot object detection with universal prototypes ({FSOD}^{up}) that owns the merit of feature generalization towards novel objects.

Ranked #23 on Few-Shot Object Detection on MS-COCO (10-shot)

Few-Shot Object Detection Meta-Learning +3

Paper
Code

Learning to Anticipate Egocentric Actions by Imagination

no code implementations • 13 Jan 2021 • Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu

We further improve ImagineRNN by residual anticipation, i. e., changing its target to predicting the feature difference of adjacent frames instead of the frame content.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-55 (Unseen test set (S2)

Action Anticipation Autonomous Driving +1

Paper
Add Code

Feature-Robust Optimal Transport for High-Dimensional Data

no code implementations • 1 Jan 2021 • Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada

To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.

feature selection Semantic correspondence +1

Paper
Add Code

Interactive Prototype Learning for Egocentric Action Recognition

no code implementations • ICCV 2021 • Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang

To avoid these additional costs, we propose an end-to-end Interactive Prototype Learning (IPL) framework to learn better active object representations by leveraging the motion cues from the actor.

Action Recognition Object +1

Paper
Add Code

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification

1 code implementation • ICCV 2021 • Yanbin Liu, Juho Lee, Linchao Zhu, Ling Chen, Humphrey Shi, Yi Yang

Most existing few-shot classification methods only consider generalization on one dataset (i. e., single-domain), failing to transfer across various seen and unseen domains.

Classification Domain Generalization

740

Paper
Code

Asynchronous Modeling: A Dual-phase Perspective for Long-Tailed Recognition

no code implementations • 1 Jan 2021 • Hu Zhang, Linchao Zhu, Yi Yang

Motivated by such phenomenon, we propose to disentangle the distinctive effects of data-rich and data-poor gradient and asynchronously train a model via a dual-phase learning process.

Classification General Classification +1

Paper
Add Code

SemGloVe: Semantic Co-occurrences for GloVe from BERT

3 code implementations • 30 Dec 2020 • Leilei Gan, Zhiyang Teng, Yue Zhang, Linchao Zhu, Fei Wu, Yi Yang

In this paper, we propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.

Language Modelling Word Embeddings +1

Paper
Code

ActBERT: Learning Global-Local Video-Text Representations

1 code implementation • CVPR 2020 • Linchao Zhu, Yi Yang

In this paper, we introduce ActBERT for self-supervised learning of joint video-text representations from unlabeled data.

Ranked #8 on Action Segmentation on COIN

Action Segmentation Question Answering +5

1,426

Paper
Code

Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition

no code implementations • CVPR 2020 • Linchao Zhu, Yi Yang

It is beneficial to incorporate more discriminative features to improve generalization on tail classes.

Ranked #16 on Long-tail Learning on Places-LT

Few-Shot Learning Long-tail Learning +1

Paper
Add Code

Feature Robust Optimal Transport for High-dimensional Data

1 code implementation • 25 May 2020 • Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada

To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.

feature selection Semantic correspondence +1

Paper
Code

OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in An Open World

no code implementations • CVPR 2021 • Zhun Zhong, Linchao Zhu, Zhiming Luo, Shaozi Li, Yi Yang, Nicu Sebe

In this paper, we tackle the problem of discovering new classes in unlabeled visual data given labeled data from disjoint classes.

Clustering Novel Class Discovery

Paper
Add Code

Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior

1 code implementation • ECCV 2020 • Hu Zhang, Linchao Zhu, Yi Zhu, Yi Yang

Most of previous work on adversarial attack mainly focus on image models, while the vulnerability of video models is less explored.

Adversarial Attack Video Classification

Paper
Code

SF-Net: Single-Frame Supervision for Temporal Action Localization

1 code implementation • ECCV 2020 • Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou

To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.

Ranked #5 on Weakly Supervised Action Localization on BEOID

Weakly Supervised Action Localization

Paper
Code

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

no code implementations • 8 Feb 2020 • Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, ie, one branch for verb classification and the other branch for noun classification.

Ranked #4 on Egocentric Activity Recognition on EGTEA

Action Recognition Egocentric Activity Recognition +5

Paper
Add Code

Connective Cognition Network for Directional Visual Commonsense Reasoning

1 code implementation • NeurIPS 2019 • Aming Wu, Linchao Zhu, Yahong Han, Yi Yang

Inspired by this idea, towards VCR, we propose a connective cognition network (CCN) to dynamically reorganize the visual neuron connectivity that is contextualized by the meaning of questions and answers.

Sentence Visual Commonsense Reasoning

Paper
Code

Instance-Invariant Domain Adaptive Object Detection via Progressive Disentanglement

no code implementations • 20 Nov 2019 • Aming Wu, Yahong Han, Linchao Zhu, Yi Yang

Most state-of-the-art methods of object detection suffer from poor generalization ability when the training and test data are from different domains, e. g., with different styles.

Disentanglement Object +2

Paper
Add Code

Gated Channel Transformation for Visual Recognition

3 code implementations • CVPR 2020 • Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.

General Classification Image Classification +5

125

Paper
Code

Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning

no code implementations • ECCV 2020 • Linchao Zhu, Sercan O. Arik, Yi Yang, Tomas Pfister

We propose a novel adaptive transfer learning framework, learning to transfer learn (L2TL), to improve performance on a target dataset by careful extraction of the related information from a source dataset.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019

no code implementations • 22 Jun 2019 • Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

In this report, we present the Baidu-UTS submission to the EPIC-Kitchens Action Recognition Challenge in CVPR 2019.

Action Recognition Object +2

Paper
Add Code

FASTER Recurrent Networks for Efficient Video Classification

no code implementations • 10 Jun 2019 • Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.

Ranked #26 on Action Recognition on UCF101

Action Classification Action Recognition +3

Paper
Add Code

Cubic LSTMs for Video Prediction

no code implementations • 20 Apr 2019 • Hehe Fan, Linchao Zhu, Yi Yang

Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities.

motion prediction Video Prediction

Paper
Add Code

Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation

no code implementations • CVPR 2019 • Fengda Zhu, Linchao Zhu, Yi Yang

Specifically, our method employs an adversarial feature adaptation model for visual representation transfer and a policy mimic strategy for policy behavior imitation.

Paper
Add Code

Filter Pruning by Switching to Neighboring CNNs with Good Attributes

no code implementations • 8 Apr 2019 • Yang He, Ping Liu, Linchao Zhu, Yi Yang

In addition, when evaluating the filter importance, only the magnitude information of the filters is considered.

Attribute Image Classification

Paper
Add Code

Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification

3 code implementations • ICCV 2019 • Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang

We propose to automatically search for a CNN architecture that is specifically suitable for the reID task.

Ranked #9 on Person Re-Identification on CUHK03 detected

Classification General Classification +3

1,551

Paper
Code

Compound Memory Networks for Few-shot Video Classification

no code implementations • ECCV 2018 • Linchao Zhu, Yi Yang

In this paper, we propose a new memory network structure for few-shot video classification by making the following contributions.

Classification General Classification +1

Paper
Add Code

Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions

no code implementations • 27 Aug 2018 • Ke Ning, Linchao Zhu, Ming Cai, Yi Yang, Di Xie, Fei Wu

We validate the effectiveness of our ASST on two large-scale datasets.

Translation Video Description

Paper
Add Code

Decoupled Novel Object Captioner

1 code implementation • 11 Apr 2018 • Yu Wu, Linchao Zhu, Lu Jiang, Yi Yang

Thus, the sequence model can be decoupled from the novel object descriptions.

Image Captioning Novel Concepts +2

Paper
Code

UTS submission to Google YouTube-8M Challenge 2017

1 code implementation • 13 Jul 2017 • Linchao Zhu, Yanbin Liu, Yi Yang

In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017.

Classification General Classification +1

Paper
Code

Few-Shot Object Recognition from Machine-Labeled Web Images

no code implementations • CVPR 2017 • Zhongwen Xu, Linchao Zhu, Yi Yang

Then, we demonstrate that with our model, machine-labeled image annotations are very effective and abundant resources to perform object recognition on novel categories.

Few-Shot Learning Object +1

Paper
Add Code

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

no code implementations • CVPR 2017 • Linchao Zhu, Zhongwen Xu, Yi Yang

This learning process makes the learned model more capable of dealing with motion speed variance.

Event Detection Video Captioning

Paper
Add Code

Uncovering Temporal Context for Video Question and Answering

no code implementations • 15 Nov 2015 • Linchao Zhu, Zhongwen Xu, Yi Yang, Alexander G. Hauptmann

In this work, we introduce Video Question Answering in temporal domain to infer the past, describe the present and predict the future.

Decoder Multiple-choice +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.