Search Results for author: Yi Yang

Found 512 papers, 249 papers with code

Content-Consistent Matching for Domain Adaptive Semantic Segmentation

1 code implementation • ECCV 2020 • Guangrui Li, Guoliang Kang, Wu Liu, Yunchao Wei, Yi Yang

The target of CCM is to acquire those synthetic images that share similar distribution with the real ones in the target domain, so that the domain gap can be naturally alleviated by employing the content-consistent synthetic images for training.

Ranked #12 on Semantic Segmentation on GTAV-to-Cityscapes Labels

Domain Adaptation Semantic Segmentation +1

Paper
Code

Benchmarking Intersectional Biases in NLP

1 code implementation • NAACL 2022 • John Lalor, Yi Yang, Kendall Smith, Nicole Forsgren, Ahmed Abbasi

While much work has highlighted biases embedded in state-of-the-art language models, and more recent efforts have focused on how to debias, research assessing the fairness and performance of biased/debiased models on downstream prediction tasks has been limited.

Benchmarking BIG-bench Machine Learning +1

Paper
Code

Constructing a Psychometric Testbed for Fair Natural Language Processing

1 code implementation • EMNLP 2021 • Ahmed Abbasi, David Dobolyi, John P. Lalor, Richard G. Netemeyer, Kendall Smith, Yi Yang

We also discuss the important implications of our work and resulting testbed for future NLP research on psychometrics and fairness.

Benchmarking Fairness +2

Paper
Code

Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts

no code implementations • ACL 2022 • Yue Guo, Yi Yang, Ahmed Abbasi

Specifically, we propose a variant of the beam search method to automatically search for biased prompts such that the cloze-style completions are the most different with respect to different demographic groups.

Fairness

Paper
Add Code

Buy Tesla, Sell Ford: Assessing Implicit Stock Market Preference in Pre-trained Language Models

no code implementations • ACL 2022 • Chengyu Chuang, Yi Yang

Given the prevalence of NLP models in financial decision making systems, this work raises the awareness of their potential implicit preferences in the stock markets.

Decision Making

Paper
Add Code

Learning Numeracy: A Simple Yet Effective Number Embedding Approach Using Knowledge Graph

1 code implementation • Findings (EMNLP) 2021 • Hanyu Duan, Yi Yang, Kar Yan Tam

Numeracy plays a key role in natural language understanding.

Knowledge Graph Embedding Natural Language Understanding

Paper
Code

BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

no code implementations • 30 Apr 2024 • Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes.

Irregular Time Series Time Series

Paper
Add Code

Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

no code implementations • 28 Apr 2024 • Yunbing Jia, Xiaoyu Kong, Fan Tang, Yixing Gao, WeiMing Dong, Yi Yang

In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition.

Data Augmentation Knowledge Distillation +1

Paper
Add Code

Understanding Privacy Risks of Embeddings Induced by Large Language Models

no code implementations • 25 Apr 2024 • Zhihao Zhu, Ninglu Shao, Defu Lian, Chenwang Wu, Zheng Liu, Yi Yang, Enhong Chen

Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations.

Retrieval

Paper
Add Code

AudioScenic: Audio-Driven Video Scene Editing

no code implementations • 25 Apr 2024 • Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

AudioScenic exploits the inherent properties of audio, namely, audio magnitude and frequency, to guide the editing process, aiming to control the temporal dynamics and enhance the temporal consistency.

Paper
Add Code

Neural Interaction Energy for Multi-Agent Trajectory Prediction

no code implementations • 25 Apr 2024 • Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE).

Trajectory Prediction

Paper
Add Code

AnyPattern: Towards In-context Image Copy Detection

2 code implementations • 21 Apr 2024 • Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang

To accommodate the "seen $\rightarrow$ unseen" generalization scenario, we construct the first large-scale pattern dataset named AnyPattern, which has the largest number of tamper patterns ($90$ for training and $10$ for testing) among all the existing ones.

Copy Detection In-Context Learning

Paper
Code

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

no code implementations • 20 Apr 2024 • Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jonathan Scholz, David Held

In this work, we propose a method for precise relative pose prediction which is provably SE(3)-equivariant, can be learned from only a few demonstrations, and can generalize across variations in a class of objects.

Pose Prediction Position +1

Paper
Add Code

Joint Conditional Diffusion Model for Image Restoration with Mixed Degradations

no code implementations • 11 Apr 2024 • Yufeng Yue, Meng Yu, Luojie Yang, Yi Yang

Image restoration is rather challenging in adverse weather conditions, especially when multiple degradations occur simultaneously.

Image Restoration

Paper
Add Code

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

no code implementations • 10 Apr 2024 • Longwei Zou, Qingyang Wang, Han Zhao, Jiangang Kong, Yi Yang, Yangdong Deng

The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks.

Quantization

Paper
Add Code

LGSDF: Continual Global Learning of Signed Distance Fields Aided by Local Updating

2 code implementations • 8 Apr 2024 • Yufeng Yue, Yinan Deng, Jiahui Wang, Yi Yang

Implicit reconstruction of ESDF (Euclidean Signed Distance Field) involves training a neural network to regress the signed distance from any point to the nearest obstacle, which has the advantages of lightweight storage and continuous querying.

Self-Supervised Learning

229

Paper
Code

Visual Knowledge in the Big Model Era: Retrospect and Prospect

no code implementations • 5 Apr 2024 • Wenguan Wang, Yi Yang, Yunhe Pan

Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner, with a deep root in cognitive psychology.

Paper
Add Code

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks

no code implementations • 4 Apr 2024 • Lei Zhang, YuHang Zhou, Yi Yang, Xinbo Gao

Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks.

Adversarial Defense Adversarial Robustness +1

Paper
Add Code

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

no code implementations • 2 Apr 2024 • Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin

In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns.

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

Paper
Add Code

Clustering for Protein Representation Learning

no code implementations • 30 Mar 2024 • Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang

We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.

Clustering Protein Folding +1

Paper
Add Code

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

no code implementations • 29 Mar 2024 • Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang

Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.

Brain Computer Interface Image Reconstruction +1

Paper
Add Code

Neural Clustering based Visual Representation Learning

1 code implementation • 26 Mar 2024 • Guikun Chen, Xia Li, Yi Yang, Wenguan Wang

In this work, we propose feature extraction with clustering (FEC), a conceptually elegant yet surprisingly ad-hoc interpretable neural clustering framework, which views feature extraction as a process of selecting representatives from data and thus automatically captures the underlying data distribution.

Clustering Representation Learning

Paper
Code

Clustering Propagation for Universal Medical Image Segmentation

1 code implementation • 25 Mar 2024 • Yuhang Ding, Liulei Li, Wenguan Wang, Yi Yang

}$ This enables knowledge acquired from prior slices to assist in the segmentation of the current slice, further efficiently bridging the communication between remote slices using mere 2D networks.

Clustering Image Segmentation +4

Paper
Code

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

no code implementations • 24 Mar 2024 • Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang

The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space.

Attribute Image Retrieval +2

Paper
Add Code

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

no code implementations • 24 Mar 2024 • Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang

We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.

Attribute Video Editing

Paper
Add Code

Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs

no code implementations • 24 Mar 2024 • Zhuoyi Peng, Yi Yang

We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases.

Self-Supervised Learning Semantic Similarity +1

Paper
Add Code

Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models

no code implementations • 23 Mar 2024 • Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang

These concealed passphrases in user documents, referred to as \textit{ghost sentences}, once they are identified in the generated content of LLMs, users can be sure that their data is used for training.

Sentence

Paper
Add Code

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

1 code implementation • 22 Mar 2024 • Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang

Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.

Paper
Code

An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification

1 code implementation • 22 Mar 2024 • Lei Zhang, Xiaowei Fu, Fuxiang Huang, Yi Yang, Xinbo Gao

Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques.

Person Re-Identification

Paper
Code

Beyond Surface Similarity: Detecting Subtle Semantic Shifts in Financial Narratives

no code implementations • 21 Mar 2024 • Jiaxin Liu, Yi Yang, Kar Yan Tam

In this paper, we introduce the Financial-STS task, a financial domain-specific NLP task designed to measure the nuanced semantic similarity between pairs of financial narratives.

Decision Making Semantic Similarity +2

Paper
Add Code

Volumetric Environment Representation for Vision-Language Navigation

1 code implementation • 21 Mar 2024 • Rui Liu, Wenguan Wang, Yi Yang

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

Multi-Task Learning Navigate +2

Paper
Code

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

1 code implementation • 14 Mar 2024 • Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

In this work, we propose OpenGraph, the first open-vocabulary hierarchical graph representation designed for large-scale outdoor environments.

Zero-Shot Learning

Paper
Code

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

1 code implementation • 10 Mar 2024 • Wenhao Wang, Yi Yang

However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts.

Copy Detection Image Generation +3

Paper
Code

DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network

no code implementations • 6 Mar 2024 • Xiangquan Gui, Binxuan Zhang, Li Li, Yi Yang

To solve such problems, in this paper, we (1) propose DLP-GAN (Draw Modern Chinese Landscape Photos with Generative Adversarial Network), an unsupervised cross-domain image translation framework with a novel asymmetric cycle mapping, and (2) introduce a generator based on a dense-fusion module to match different translation directions.

Generative Adversarial Network Translation

Paper
Add Code

RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules

1 code implementation • 5 Mar 2024 • Miaomiao Li, Jiaqi Zhu, Yang Wang, Yi Yang, Yilin Li, Hongan Wang

Weakly supervised text classification (WSTC), also called zero-shot or dataless text classification, has attracted increasing attention due to its applicability in classifying a mass of texts within the dynamic and open Web environment, since it requires only a limited set of seed words (label names) for each category instead of labeled data.

Pseudo Label text-classification +1

Paper
Code

Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States

no code implementations • 15 Feb 2024 • Hanyu Duan, Yi Yang, Kar Yan Tam

More specifically, we check whether and how an LLM reacts differently in its hidden states when it answers a question right versus when it hallucinates.

Hallucination

Paper
Add Code

ProtChatGPT: Towards Understanding Proteins with Large Language Models

no code implementations • 15 Feb 2024 • Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang

The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM.

Paper
Add Code

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

no code implementations • 9 Feb 2024 • Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang

Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh.

Paper
Add Code

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

1 code implementation • 8 Feb 2024 • Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).

Ranked #1 on Conditional Text-to-Image Synthesis on COCO-MIG

Attribute Conditional Text-to-Image Synthesis +1

385

Paper
Code

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

1 code implementation • 5 Feb 2024 • Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, Ling Shao, Yi Yang, Bojun Gao, Qun Li, Guobin Wu

Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems.

Continual Learning Multi-Task Learning +1

Paper
Code

CapHuman: Capture Your Moments in Parallel Universes

1 code implementation • 1 Feb 2024 • Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.

Image Generation

Paper
Code

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations • 1 Feb 2024 • Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

1,066

Paper
Code

Retrosynthesis prediction enhanced by in-silico reaction data augmentation

no code implementations • 31 Jan 2024 • Xu Zhang, Yiming Mo, Wenguan Wang, Yi Yang

As a response, we exploit easy-to-access unpaired data (i. e., one component of product-reactant(s) pair) for generating in-silico paired data to facilitate model training.

Data Augmentation Retrosynthesis

Paper
Add Code

DeFlow: Decoder of Scene Flow Network in Autonomous Driving

2 code implementations • 29 Jan 2024 • Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt

Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving.

Ranked #1 on Scene Flow Estimation on Argoverse 2

Autonomous Driving Decoder

235

Paper
Code

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

1 code implementation • 27 Jan 2024 • Yixuan Tang, Yi Yang

We hope MultiHop-RAG will be a valuable resource for the community in developing effective RAG systems, thereby facilitating greater adoption of LLMs in practice.

Benchmarking Retrieval

Paper
Code

Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation

no code implementations • 23 Jan 2024 • Kexin Li, Tao Jiang, Zongxin Yang, Yi Yang, Yueting Zhuang, Jun Xiao

Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction.

Interactive Video Object Segmentation Semantic Segmentation +1

Paper
Add Code

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles

no code implementations • 20 Jan 2024 • Yanlong Zang, Han Yang, Jiaxu Miao, Yi Yang

Image-based virtual try-on systems, which fit new garments onto human portraits, are gaining research attention. An ideal pipeline should preserve the static features of clothes(like textures and logos)while also generating dynamic elements(e. g. shadows, folds)that adapt to the model's pose and environment. Previous works fail specifically in generating dynamic features, as they preserve the warped in-shop clothes trivially with predicted an alpha mask by composition. To break the dilemma of over-preserving and textures losses, we propose a novel diffusion-based Product-level virtual try-on pipeline,\ie PLTON, which can preserve the fine details of logos and embroideries while producing realistic clothes shading and wrinkles. The main insights are in three folds:1)Adaptive Dynamic Rendering:We take a pre-trained diffusion model as a generative prior and tame it with image features, training a dynamic extractor from scratch to generate dynamic tokens that preserve high-fidelity semantic information.

Denoising Virtual Try-on

Paper
Add Code

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

1 code implementation • 19 Jan 2024 • Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.

Retrieval Video Retrieval

Paper
Code

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

1 code implementation • 16 Jan 2024 • Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Considering the video modality better reflects the ever-changing nature of real-world scenarios, we exemplify DoraemonGPT as a video agent.

Scheduling

Paper
Code

AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

no code implementations • 12 Jan 2024 • Yuanzhi Liang, Linchao Zhu, Yi Yang

To address this challenge, we introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.

Informativeness

Paper
Add Code

MS-DETR: Efficient DETR Training with Mixed Supervision

1 code implementation • 8 Jan 2024 • Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

Decoder Object +2

Paper
Code

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

no code implementations • 1 Jan 2024 • Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang

Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details.

Image to 3D Novel View Synthesis +1

Paper
Add Code

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

1 code implementation • 23 Dec 2023 • MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang

In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.

Paper
Code

Model Stealing Attack against Recommender System

no code implementations • 18 Dec 2023 • Zhihao Zhu, Rui Fan, Chenwang Wu, Yi Yang, Defu Lian, Enhong Chen

Some adversarial attacks have achieved model stealing attacks against recommender systems, to some extent, by collecting abundant training data of the target model (target data) or making a mass of queries.

Recommendation Systems

Paper
Add Code

Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity

no code implementations • 18 Dec 2023 • Zhihao Zhu, Chenwang Wu, Rui Fan, Yi Yang, Defu Lian, Enhong Chen

Recent research demonstrates that GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.

Active Learning Graph Classification +1

Paper
Add Code

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

no code implementations • 13 Dec 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

For geometry, we propose to constrain the optimized avatar in a decent global shape with a template avatar.

Prompt Engineering Text to 3D +1

Paper
Add Code

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

no code implementations • 12 Dec 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.

Ranked #6 on Zero-Shot Video Question Answer on MSRVTT-QA

Hallucination Position +2

Paper
Add Code

DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers

1 code implementation • 11 Dec 2023 • Sarin Chandy, Varun Gangal, Yi Yang, Gabriel Maggiotti

DYAD is based on a bespoke near-sparse matrix structure which approximates the dense "weight" matrix W that matrix-multiplies the input in the typical realization of such a layer, a. k. a DENSE.

Descriptive

Paper
Code

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

no code implementations • 10 Dec 2023 • Zechuan Zhang, Zongxin Yang, Yi Yang

A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction.

Paper
Add Code

Learning from One Continuous Video Stream

no code implementations • 1 Dec 2023 • João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling.

Data Augmentation Future prediction

Paper
Add Code

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

no code implementations • 29 Nov 2023 • Jianfeng Zhang, Xuanmeng Zhang, Huichao Zhang, Jun Hao Liew, Chenxu Zhang, Yi Yang, Jiashi Feng

We study the problem of creating high-fidelity and animatable 3D avatars from only textual descriptions.

Paper
Add Code

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

no code implementations • 27 Nov 2023 • Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang

Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.

Video Generation

Paper
Add Code

Scalable AI Generative Content for Vehicular Network Semantic Communication

no code implementations • 23 Nov 2023 • Hao Feng, Yi Yang, Zhu Han

Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.

Decoder

Paper
Add Code

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation

no code implementations • 21 Nov 2023 • Mu Chen, Zhedong Zheng, Yi Yang

Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i. e., segmentation and depth learning in an end-to-end manner.

Ranked #1 on Synthetic-to-Real Translation on GTAV-to-Cityscapes Labels

Depth Estimation Scene Segmentation +2

Paper
Add Code

Cut-and-Paste: Subject-Driven Video Editing with Attention Control

no code implementations • 20 Nov 2023 • Zhichao Zuo, Zhao Zhang, Yan Luo, Yang Zhao, Haijun Zhang, Yi Yang, Meng Wang

This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image.

Object Video Editing

Paper
Add Code

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

1 code implementation • 20 Nov 2023 • Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang

Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process.

Ranked #1 on Generalizable Novel View Synthesis on Shiny dataset

Generalizable Novel View Synthesis

Paper
Code

Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement

no code implementations • 20 Nov 2023 • Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang

The generalization capability of existing image restoration and enhancement (IRE) methods is constrained by the limited pre-trained datasets, making it difficult to handle agnostic inputs such as different degradation levels and scenarios beyond their design scopes.

Image Restoration Language Modelling

Paper
Add Code

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads

no code implementations • 17 Nov 2023 • Yi Yang, Hanyu Duan, Ahmed Abbasi, John P. Lalor, Kar Yan Tam

Although a burgeoning literature has emerged on stereotypical bias mitigation in PLMs, such as work on debiasing gender and racial stereotyping, how such biases manifest and behave internally within PLMs remains largely unknown.

Decoder Fairness +1

Paper
Add Code

Exploring the Relationship between In-Context Learning and Instruction Tuning

no code implementations • 17 Nov 2023 • Hanyu Duan, Yixuan Tang, Yi Yang, Ahmed Abbasi, Kar Yan Tam

In this work, we explore the relationship between ICL and IT by examining how the hidden states of LLMs change in these two paradigms.

In-Context Learning

Paper
Add Code

Human-Centric Autonomous Systems With LLMs for User Command Reasoning

1 code implementation • 14 Nov 2023 • Yi Yang, Qingwen Zhang, Ci Li, Daniel Simões Marta, Nazre Batool, John Folkesson

The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality.

Autonomous Driving Binary Classification

Paper
Code

Text Augmented Spatial-aware Zero-shot Referring Image Segmentation

no code implementations • 27 Oct 2023 • Yucheng Suo, Linchao Zhu, Yi Yang

This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations.

Image Segmentation Referring Expression +4

Paper
Add Code

RDBench: ML Benchmark for Relational Databases

no code implementations • 25 Oct 2023 • Zizhao Zhang, Yi Yang, Lutong Zou, He Wen, Tao Feng, Jiaxuan You

Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications.

Benchmarking

Paper
Add Code

PPFL: A Personalized Federated Learning Framework for Heterogeneous Population

no code implementations • 22 Oct 2023 • Hao Di, Yi Yang, Haishan Ye, Xiangyu Chang

Personalization aims to characterize individual preferences and is widely applied across many fields.

Personalized Federated Learning

Paper
Add Code

FinEntity: Entity-level Sentiment Classification for Financial Texts

1 code implementation • 19 Oct 2023 • Yixuan Tang, Yi Yang, Allen H Huang, Andy Tam, Justin Z Tang

In this work, we introduce an entity-level sentiment classification dataset, called \textbf{FinEntity}, that annotates financial entity spans and their sentiment (positive, neutral, and negative) in financial news.

Classification Sentiment Analysis +1

Paper
Code

Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications

no code implementations • 19 Oct 2023 • Yue Guo, Chenxi Hu, Yi Yang

Temporal data distribution shift is prevalent in the financial text.

Out-of-Distribution Detection Sentiment Analysis +1

Paper
Add Code

Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing

no code implementations • 19 Oct 2023 • Yue Guo, Zian Xu, Yi Yang

This study compares the performance of encoder-only language models and the decoder-only language models.

Decoder Language Modelling

Paper
Add Code

Fast and Accurate Factual Inconsistency Detection Over Long Documents

1 code implementation • 19 Oct 2023 • Barrett Martin Lattimer, Patrick Chen, Xinyuan Zhang, Yi Yang

We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy.

Chunking Natural Language Inference +3

Paper
Code

Combating Label Noise With A General Surrogate Model For Sample Selection

no code implementations • 16 Oct 2023 • Chao Liang, Linchao Zhu, Humphrey Shi, Yi Yang

Sample selection is an effective way to deal with label noise.

Memorization Selection bias

Paper
Add Code

TacticAI: an AI assistant for football tactics

no code implementations • 16 Oct 2023 • Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls

The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC.

Retrieval

Paper
Add Code

IcoCap: Improving Video Captioning by Compounding Images

no code implementations • IEEE Transactions on Multimedia 2023 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.

Ranked #5 on Video Captioning on VATEX (using extra training data)

Image Captioning Video Captioning

Paper
Add Code

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

no code implementations • ICCV 2023 • Xuanmeng Zhang, Jianfeng Zhang, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng

We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries.

Image Generation

Paper
Add Code

LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning

no code implementations • ICCV 2023 • Liulei Li, Wenguan Wang, Yi Yang

Current high-performance semantic segmentation models are purely data-driven sub-symbolic approaches and blind to the structured nature of the visual world.

Segmentation Semantic Parsing +1

Paper
Add Code

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

1 code implementation • NeurIPS 2023 • Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.

Decoder

Paper
Code

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation • 18 Sep 2023 • Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

Paper
Code

RMP: A Random Mask Pretrain Framework for Motion Prediction

1 code implementation • 16 Sep 2023 • Yi Yang, Qingwen Zhang, Thomas Gilles, Nazre Batool, John Folkesson

As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving.

Autonomous Driving motion prediction +1

Paper
Code

InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning

1 code implementation • 15 Sep 2023 • Yi Yang, Yixuan Tang, Kar Yan Tam

We present a new financial domain large language model, InvestLM, tuned on LLaMA-65B (Touvron et al., 2023), using a carefully curated instruction dataset related to financial investment.

Language Modelling Large Language Model

Paper
Code

MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems

no code implementations • 14 Sep 2023 • Yu Gao, Lutong Su, Hao Liang, Yufeng Yue, Yi Yang, Mengyin Fu

In this paper, we propose MC-NeRF, a method that enables joint optimization of both intrinsic and extrinsic parameters alongside NeRF.

Paper
Add Code

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

1 code implementation • 13 Sep 2023 • Dongwei Ren, Wei Shang, Yi Yang, WangMeng Zuo

To aggregate long-term sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability.

Deblurring

Paper
Code

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

1 code implementation • ICCV 2023 • Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications.

Talking Head Generation

223

Paper
Code

Editing 3D Scenes via Text Prompts without Retraining

no code implementations • 10 Sep 2023 • Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining.

3D scene Editing 3D Scene Reconstruction +2

Paper
Add Code

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

1 code implementation • 4 Sep 2023 • Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity. Despite the recent significant process in text-based human motion generation, existing methods often prioritize fitting training motions at the expense of action diversity.

Ranked #3 on Motion Synthesis on HumanML3D (using extra training data)

Language Modelling Motion Synthesis

Paper
Code

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

no code implementations • 30 Aug 2023 • Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly.

Paper
Add Code

AIoT-Based Drum Transcription Robot using Convolutional Neural Networks

no code implementations • 29 Aug 2023 • Yukun Su, Yi Yang

With the development of information technology, robot technology has made great progress in various fields.

Drum Transcription Music Transcription

Paper
Add Code

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

1 code implementation • ICCV 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).

Ranked #11 on Visual Object Tracking on LaSOT

Object Representation Learning +6

Paper
Code

Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation

no code implementations • ICCV 2023 • Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Recent advances in semi-supervised semantic segmentation have been heavily reliant on pseudo labeling to compensate for limited labeled data, disregarding the valuable relational knowledge among semantic concepts.

Ranked #2 on Semi-Supervised Semantic Segmentation on COCO 1/32 labeled

Segmentation Semi-Supervised Semantic Segmentation

Paper
Add Code

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

no code implementations • ICCV 2023 • Jinyu Chen, Wenguan Wang, Si Liu, Hongsheng Li, Yi Yang

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Decision Making Transfer Learning +1

Paper
Add Code

Compositional Feature Augmentation for Unbiased Scene Graph Generation

1 code implementation • ICCV 2023 • Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen

Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively.

Graph Generation Relation +1

Paper
Code

Bird's-Eye-View Scene Graph for Vision-Language Navigation

1 code implementation • ICCV 2023 • Rui Liu, Xiaohan Wang, Wenguan Wang, Yi Yang

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.

Navigate Vision-Language Navigation

Paper
Code

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

no code implementations • 31 Jul 2023 • Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli

The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.

Action Segmentation Human-Object Interaction Detection +2

Paper
Add Code

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation • ICCV 2023 • Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

Paper
Code

Clustering based Point Cloud Representation Learning for 3D Analysis

1 code implementation • ICCV 2023 • Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng

The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.

Clustering Point Cloud Segmentation +2

Paper
Code

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation

1 code implementation • 25 Jul 2023 • Haitian Zeng, Xiaohan Wang, Wenguan Wang, Yi Yang

We introduce a novel speaker model \textsc{Kefa} for navigation instruction generation.

Vision and Language Navigation

Paper
Code

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

1 code implementation • 24 Jul 2023 • Yuanzhi Liang, Linchao Zhu, Yi Yang

MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions.

Paper
Code

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

no code implementations • ICCV 2023 • Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL.

Paper
Add Code

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

no code implementations • 13 Jul 2023 • Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.

Paper
Add Code

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

1 code implementation • 10 Jul 2023 • Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang

In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration.

Image Registration Semantic Segmentation

Paper
Code

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

no code implementations • 5 Jul 2023 • Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation.

Object Position +4

Paper
Add Code

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

no code implementations • 5 Jul 2023 • Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang

MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8.

Object Segmentation +4

Paper
Add Code

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation • 3 Jul 2023 • Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Paper
Code

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

no code implementations • 25 Jun 2023 • Zhoufutu Wen, Xinyu Zhao, Zhipeng Jin, Yi Yang, Wei Jia, Xiaodong Chen, Shuanglong Li, Lin Liu

The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling.

Image Retrieval Retrieval

Paper
Add Code

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

1 code implementation • 15 Jun 2023 • Jiayi Shao, Xiaohan Wang, Ruijie Quan, Yi Yang

This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries.

Ranked #1 on Moment Queries on Ego4D

Moment Queries Natural Language Queries

Paper
Code

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

1 code implementation • ICCV 2023 • Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.

Ranked #1 on Visual Tracking on Kinetics

Motion Estimation Visual Tracking

1,066

Paper
Code

Shuffled Autoregression For Motion Interpolation

no code implementations • 10 Jun 2023 • Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing

However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.

Motion Interpolation

Paper
Add Code

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

no code implementations • 3 Jun 2023 • Xu Zhang, Zhedong Zheng, Xiaohan Wang, Yi Yang

We propose a novel Consensus Network (Css-Net) that self-adaptively learns from noisy triplets to minimize the negative effects of triplet ambiguity.

Ranked #1 on Image Retrieval with Multi-Modal Query on Fashion200k

Image Retrieval Image Retrieval with Multi-Modal Query +1

Paper
Add Code

A Feature Reuse Framework with Texture-adaptive Aggregation for Reference-based Super-Resolution

1 code implementation • 2 Jun 2023 • Xiaoyong Mei, Yi Yang, Ming Li, Changqin Huang, Kai Zhang, Pietro Lió

In this study, we propose a feature reuse framework that guides the step-by-step texture reconstruction process through different stages, reducing the negative impacts of perceptual and adversarial loss.

Image Super-Resolution Reference-based Super-Resolution

Paper
Code

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

1 code implementation • 29 May 2023 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Given a single test sample, the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.

Image Captioning Image Classification +5

Paper
Code

Whitening-based Contrastive Learning of Sentence Embeddings

1 code implementation • 28 May 2023 • Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment.

Contrastive Learning Semantic Textual Similarity +4

Paper
Code

Action Sensitivity Learning for Temporal Action Localization

no code implementations • ICCV 2023 • Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang

Temporal action localization (TAL), which involves recognizing and locating action instances, is a challenging task in video understanding.

Ranked #9 on Temporal Action Localization on THUMOS’14

Moment Queries Temporal Action Localization +1

Paper
Add Code

Counterfactual Co-occurring Learning for Bias Mitigation in Weakly-supervised Object Localization

no code implementations • 24 May 2023 • Feifei Shao, Yawei Luo, Lei Chen, Ping Liu, Wei Yang, Yi Yang, Jun Xiao

In this paper, we conduct a thorough causal analysis to investigate the origins of biased activation.

Attribute counterfactual +1

Paper
Add Code

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

2 code implementations • NeurIPS 2023 • Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, João Carreira

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e. g. Flamingo, SeViLA, or GPT-4).

counterfactual Descriptive +2

152

Paper
Code

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

1 code implementation • 23 May 2023 • Shuai Zhao, Ruijie Quan, Linchao Zhu, Yi Yang

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

Ranked #1 on Scene Text Recognition on Uber-Text

Decoder Language Modelling +1

Paper
Code

Gloss-Free End-to-End Sign Language Translation

1 code implementation • 22 May 2023 • Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations.

Sign Language Translation Translation

Paper
Code

Disentangling Structured Components: Towards Adaptive, Interpretable and Scalable Time Series Forecasting

1 code implementation • 22 May 2023 • Jinliang Deng, Xiusi Chen, Renhe Jiang, Du Yin, Yi Yang, Xuan Song, Ivor W. Tsang

The core issue in MTS forecasting is how to effectively model complex spatial-temporal patterns.

Ranked #1 on Time Series Forecasting on Weather (96)

Multivariate Time Series Forecasting Time Series

Paper
Code

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations • 22 May 2023 • Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Retrieval +6

Paper
Add Code

PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis

1 code implementation • 20 May 2023 • Yi Yang, Hejie Cui, Carl Yang

The human brain is the central hub of the neurobiological system, controlling behavior and cognition in complex ways.

Transfer Learning Unsupervised Pre-training

Paper
Code

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation • NeurIPS 2023 • Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

Ranked #3 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Point Cloud Classification Decoder +1

169

Paper
Code

Pyramid Diffusion Models For Low-light Image Enhancement

1 code implementation • 17 May 2023 • Dewei Zhou, Zongxin Yang, Yi Yang

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.

Ranked #6 on Low-Light Image Enhancement on LOL

Denoising Image Generation +1

135

Paper
Code

Segment and Track Anything

1 code implementation • 11 May 2023 • Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

2,488

Paper
Code

Video Object Segmentation in Panoptic Wild Scenes

2 code implementations • 8 May 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.

Object Semantic Segmentation +2

564

Paper
Code

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

1 code implementation • 26 Apr 2023 • Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang

In MOTOR, we combine two kinds of basic medical knowledge, i. e., general and specific knowledge, in a complementary manner to boost the general pretraining process.

Medical Visual Question Answering Question Answering +1

Paper
Code

Feature-compatible Progressive Learning for Video Copy Detection

2 code implementations • 20 Apr 2023 • Wenhao Wang, Yifan Sun, Yi Yang

Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.

Copy Detection Video Similarity

Paper
Code

DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection

no code implementations • CVPR 2023 • Zongheng Tang, Yifan Sun, Si Liu, Yi Yang

Second, through our design, the object queries and the foreground query in the decoder share consensus on the class semantics, therefore making the strong and weak supervision mutually benefit each other for domain alignment.

Decoder object-detection +1

Paper
Add Code

Efficient Multimodal Fusion via Interactive Prompting

no code implementations • CVPR 2023 • Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang

Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.

Paper
Add Code

TransHP: Image Classification with Hierarchical Prompting

1 code implementation • NeurIPS 2023 • Wenhao Wang, Yifan Sun, Wei Li, Yi Yang

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.

Classification Image Classification

Paper
Code

PVD-AL: Progressive Volume Distillation with Active Learning for Efficient Conversion Between Different NeRF Architectures

1 code implementation • 8 Apr 2023 • Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou

To address this limitation and maximize the potential of each architecture, we propose Progressive Volume Distillation with Active Learning (PVD-AL), a systematic distillation method that enables any-to-any conversions between different architectures.

3D Reconstruction Novel View Synthesis

182

Paper
Code

GIF: A General Graph Unlearning Strategy via Influence Function

1 code implementation • 6 Apr 2023 • Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He

Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data.

Machine Unlearning

Paper
Code

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time

2 code implementations • CVPR 2023 • Wei Shang, Dongwei Ren, Yi Yang, Hongzhi Zhang, Kede Ma, WangMeng Zuo

Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1. 5 dB in terms of PSNR.

Contrastive Learning Deblurring +2

Paper
Code

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation • CVPR 2023 • Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

Ranked #46 on 3D Human Pose Estimation on 3DPW

3D human pose and shape estimation

Paper
Code

CRRS: Concentric Rectangles Regression Strategy for Multi-point Representation on Fisheye Images

no code implementations • 26 Mar 2023 • Xihan Wang, Xi Xu, Yu Gao, Yi Yang, Yufeng Yue, Mengyin Fu

Compared with the previous work for muti-point representation, the experiments show that CRRS can improve the training performance both in accurate and stability.

regression

Paper
Add Code

Sector Patch Embedding: An Embedding Module Conforming to The Distortion Pattern of Fisheye Image

no code implementations • 26 Mar 2023 • Dianyi Yang, Jiadong Tang, Yu Gao, Yi Yang, Mengyin Fu

And this fact leads to poor performance on some fisheye vision tasks.

Paper
Add Code

Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning

1 code implementation • 23 Mar 2023 • Wenqing Wang, Yawei Luo, Zhiqing Chen, Tao Jiang, Lei Chen, Yi Yang, Jun Xiao

Specifically, DLL decouples the predicate labels and adopts separate classifiers to learn actional and spatial patterns respectively.

Ranked #1 on Video scene graph generation on ImageNet-VidVRD

Graph Generation Scene Graph Generation +1

Paper
Code

Decomposed Prototype Learning for Few-Shot Scene Graph Generation

no code implementations • 20 Mar 2023 • Xingchen Li, Long Chen, Guikun Chen, Yinfu Feng, Yi Yang, Jun Xiao

To this end, we propose a novel Decomposed Prototype Learning (DPL).

Few-Shot Learning Graph Generation +1

Paper
Add Code

Exploring Expression-related Self-supervised Learning for Affective Behaviour Analysis

1 code implementation • 18 Mar 2023 • Fanglei Xue, Yifan Sun, Yi Yang

This paper explores an expression-related self-supervised learning (SSL) method (ContraWarping) to perform expression classification in the 5th Affective Behavior Analysis in-the-wild (ABAW) competition.

Self-Supervised Learning

Paper
Code

Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

1 code implementation • CVPR 2023 • Liulei Li, Wenguan Wang, Tianfei Zhou, Jianwu Li, Yi Yang

The objective of this paper is self-supervised learning of video object segmentation.

Segmentation Self-Supervised Learning +4

Paper
Code

Unsupervised Facial Expression Representation Learning with Contrastive Local Warping

1 code implementation • 16 Mar 2023 • Fanglei Xue, Yifan Sun, Yi Yang

Therefore, given a facial image, ContraWarping employs some global transformations and local warping to generate its positive and negative samples and sets up a novel contrastive learning framework.

Contrastive Learning Facial Expression Recognition +4

Paper
Code

Lana: A Language-Capable Navigator for Instruction Following and Generation

1 code implementation • CVPR 2023 • Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.

Instruction Following Text Generation

Paper
Code

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

1 code implementation • 6 Mar 2023 • Wei Li, Linchao Zhu, Longyin Wen, Yi Yang

This decoder is both data-efficient and computation-efficient: 1) it only requires the text data for training, easing the burden on the collection of paired data.

Decoder Image Captioning +1

109

Paper
Code

Soft Prompt Guided Joint Learning for Cross-Domain Sentiment Analysis

no code implementations • 1 Mar 2023 • Jingli Shi, Weihua Li, Quan Bai, Yi Yang, Jianhua Jiang

Aspect term extraction is a fundamental task in fine-grained sentiment analysis, which aims at detecting customer's opinion targets from reviews on product or service.

Sentiment Analysis Term Extraction +1

Paper
Add Code

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations • 22 Jan 2023 • Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Paper
Add Code

Temporal Perceiving Video-Language Pre-training

no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang

Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.

Contrastive Learning Moment Retrieval +7

Paper
Add Code

DR-WLC: Dimensionality Reduction cognition for object detection and pose estimation by Watching, Learning and Checking

no code implementations • 17 Jan 2023 • Yu Gao, Xi Xu, Tianji Jiang, Siyuan Chen, Yi Yang, Yufeng Yue, Mengyin Fu

For example, 2D object detection usually requires a large amount of 2D annotation data with high cost.

Autonomous Driving Dimensionality Reduction +4

Paper
Add Code

Analogical Inference Enhanced Knowledge Graph Embedding

1 code implementation • 3 Jan 2023 • Zhen Yao, Wen Zhang, Mingyang Chen, Yufeng Huang, Yi Yang, Huajun Chen

And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding.

Knowledge Graph Embedding Knowledge Graphs +1

Paper
Code

Knowledge-guided Causal Intervention for Weakly-supervised Object Localization

1 code implementation • 3 Jan 2023 • Feifei Shao, Yawei Luo, Fei Gao, Yi Yang, Jun Xiao

Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels.

Knowledge Distillation Object +1

Paper
Code

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

no code implementations • CVPR 2023 • Jiaxu Miao, Zongxin Yang, Leilei Fan, Yi Yang

In this work, we propose FedSeg, a basic federated learning approach for class-heterogeneous semantic segmentation.

Contrastive Learning Federated Learning +3

Paper
Add Code

Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation

no code implementations • CVPR 2023 • Guangrui Li, Guoliang Kang, Xiaohan Wang, Yunchao Wei, Yi Yang

With the help of adversarial training, the masking module can learn to generate source masks to mimic the pattern of irregular target noise, thereby narrowing the domain gap.

Point Cloud Segmentation Semantic Segmentation

Paper
Add Code

PointListNet: Deep Learning on 3D Point Lists

no code implementations • CVPR 2023 • Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli

Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.

Paper
Add Code

Rethinking Point Cloud Registration as Masking and Reconstruction

1 code implementation • ICCV 2023 • Guangyan Chen, Meiling Wang, Li Yuan, Yi Yang, Yufeng Yue

In this paper, a critical observation is made that the invisible parts of each point cloud can be directly utilized as inherent masks, and the aligned point cloud pair can be regarded as the reconstruction target.

Point Cloud Registration

Paper
Code

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

1 code implementation • ICCV 2023 • Liangqi Li, Jiaxu Miao, Dahu Shi, Wenming Tan, Ye Ren, Yi Yang, ShiLiang Pu

Current methods for open-vocabulary object detection (OVOD) rely on a pre-trained vision-language model (VLM) to acquire the recognition ability.

Knowledge Distillation Language Modelling +2

137

Paper
Code

Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation

1 code implementation • ICCV 2023 • Heng Zhao, Shenxing Wei, Dahu Shi, Wenming Tan, Zheyang Li, Ye Ren, Xing Wei, Yi Yang, ShiLiang Pu

Taking the symmetry properties of objects into consideration, we design a symmetry-aware matching loss to facilitate the learning of dense point-wise geometry features and improve the performance considerably.

6D Pose Estimation 6D Pose Estimation using RGB +3

Paper
Code

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

no code implementations • CVPR 2023 • Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang

Based on these two common practices, the key point of ProD is using the prompting mechanism in the transformer to disentangle the domain-general (DG) and domain-specific (DS) knowledge from the backbone feature.

Cross-Domain Few-Shot Domain Generalization +1

Paper
Add Code

Context-Aware Pretraining for Efficient Blind Image Decomposition

1 code implementation • CVPR 2023 • Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun, Yi Yang

(2) The conventional paradigm usually focuses on mining the abnormal pattern of a superimposed image to separate the noise, which de facto conflicts with the primary image restoration task.

Attribute Image Reconstruction +1

Paper
Code

MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects

1 code implementation • ICCV 2023 • Yuanzhi Liang, Xiaohan Wang, Linchao Zhu, Yi Yang

Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.

Object

Paper
Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations • CVPR 2023 • Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Ranked #1 on Zero-Shot Action Recognition on ActivityNet

Action Classification Action Recognition +3

207

Paper
Code

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

no code implementations • 25 Dec 2022 • Xiaolong Shen, Zhedong Zheng, Yi Yang

As its name suggests, it is made up of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling.

Optical Flow Estimation Sign Language Recognition

Paper
Add Code

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

1 code implementation • CVPR 2023 • Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must.

Ranked #2 on Video Question Answering on AGQA 2.0 balanced

Question Answering Video Question Answering +2

Paper
Code

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

1 code implementation • 29 Nov 2022 • Shuangkang Fang, Weixin Xu, Heng Wang, Yi Yang, Yufeng Wang, Shuchang Zhou

In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions.

Ranked #1 on Novel View Synthesis on NeRF (Average PSNR metric)

3D Reconstruction Neural Rendering +1

182

Paper
Code

Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification

1 code implementation • IEEE Transactions on Neural Networks and Learning Systems 2022 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods.

Ranked #28 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Fine-Grained Visual Recognition

Paper
Code

A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

no code implementations • 19 Nov 2022 • Yi Yang, Zhong-Qiu Zhao, Quan Bai, Qing Liu, Weihua Li

Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data.

Paper
Add Code

Stereo Image Rain Removal via Dual-View Mutual Attention

no code implementations • 18 Nov 2022 • Yanyan Wei, Zhao Zhang, ZhongQiu Zhao, Yang Zhao, Richang Hong, Yi Yang

Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e. g., rain removal and super-resolution.

Disparity Estimation Image Restoration +2

Paper
Add Code

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

1 code implementation • 17 Nov 2022 • Jiayi Shao, Xiaohan Wang, Yi Yang

Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism.

Moment Queries Temporal Action Localization

Paper
Code

Exploiting Contrastive Learning and Numerical Evidence for Confusing Legal Judgment Prediction

1 code implementation • 15 Nov 2022 • Leilei Gan, Baokui Li, Kun Kuang, Yating Zhang, Lei Wang, Luu Anh Tuan, Yi Yang, Fei Wu

Given the fact description text of a legal case, legal judgment prediction (LJP) aims to predict the case's charge, law article and penalty term.

Contrastive Learning

Paper
Code

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

1 code implementation • 14 Nov 2022 • Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua

In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts.

Ranked #1 on Semantic Segmentation on SYNTHIA-to-Cityscapes

Self-Supervised Learning Semantic Segmentation +2

Paper
Code

An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention

no code implementations • 11 Nov 2022 • Yong Hong, Deren Li, Shupei Luo, Xin Chen, Yi Yang, Mi Wang

This study proposes an improved end-to-end multi-target tracking algorithm that adapts to multi-view multi-scale scenes based on the self-attentive mechanism of the transformer's encoder-decoder structure.

Decoder Multiple Object Tracking

Paper
Add Code

Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization

no code implementations • 10 Nov 2022 • Tingyu Wang, Zhedong Zheng, Zunjie Zhu, Yuhan Gao, Yi Yang, Chenggang Yan

Cross-view geo-localization aims to spot images of the same location shot from two platforms, e. g., the drone platform and the satellite platform.

Paper
Add Code

DMRNet++: Learning Discriminative Features with Decoupled Networks and Enriched Pairs for One-Step Person Search

no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 • Chuchu Han, Zhedong Zheng, Kai Su, Dongdong Yu, Zehuan Yuan, Changxin Gao, Nong Sang, Yi Yang

Person search aims at localizing and recognizing query persons from raw video frames, which is a combination of two sub-tasks, i. e., pedestrian detection and person re-identification.

Ranked #3 on Person Search on PRW

Pedestrian Detection Person Re-Identification +1

Paper
Add Code

NoiSER: Noise is All You Need for Low-Light Image Enhancement

no code implementations • 9 Nov 2022 • Zhao Zhang, Suiyi Zhao, Xiaojie Jin, Mingliang Xu, Yi Yang, Shuicheng Yan

In this paper, we present an embarrassingly simple yet effective solution to a seemingly impossible mission, low-light image enhancement (LLIE) without access to any task-related data.

Low-Light Image Enhancement regression

Paper
Add Code

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations • 7 Nov 2022 • Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

1,066

Paper
Code

Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning

no code implementations • 5 Nov 2022 • Zhe Liu, Yun Li, Lina Yao, Xiaojun Chang, Wei Fang, XiaoJun Wu, Yi Yang

We design Semantic Attention (SA) and generative Knowledge Disentanglement (KD) to learn the dependence of feasibility and contextuality, respectively.

Compositional Zero-Shot Learning Disentanglement

Paper
Add Code

Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in The Dark

no code implementations • 2 Nov 2022 • Huan Zheng, Zhao Zhang, Jicong Fan, Richang Hong, Yi Yang, Shuicheng Yan

Specifically, we present a decoupled interaction module (DIM) that aims for sufficient dual-view information interaction.

Image Enhancement

Paper
Add Code

Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing

no code implementations • 28 Oct 2022 • Wenguan Wang, Yi Yang, Fei Wu

Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years.

Paper
Add Code

Tele-Knowledge Pre-training for Fault Analysis

1 code implementation • 20 Oct 2022 • Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, YingYing Li, Lei Cheng, Huajun Chen

In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents.

Language Modelling

Paper
Code

Perception Test: A Diagnostic Benchmark for Multimodal Models

1 code implementation • Deep Mind 2022 • Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Skanda Koppula, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman and João Carreira

We propose a novel multimodal benchmark – the Perception Test – that aims to extensively evaluate perception and reasoning skills of multimodal models.

Multiple-choice Question Answering +1

152

Paper
Code

Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation

no code implementations • 18 Oct 2022 • Ruijun Li, Weihua Li, Yi Yang, Hanyu Wei, Jianhua Jiang, Quan Bai

Recently, diffusion models have been proven to perform remarkably well in text-to-image synthesis tasks in a number of studies, immediately presenting new study opportunities for image generation.

Ranked #1 on Text-to-Image Generation on Multi-Modal-CelebA-HQ

Language Modelling Text-to-Image Generation

Paper
Add Code

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

2 code implementations • 18 Oct 2022 • Zongxin Yang, Yi Yang

To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

Ranked #1 on Semi-Supervised Video Object Segmentation on VOT2020

Object Semantic Segmentation +2

564

Paper
Code

Feature-Proxy Transformer for Few-Shot Segmentation

2 code implementations • 13 Oct 2022 • Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen

With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head.

Ranked #1 on Few-Shot Semantic Segmentation on COCO-20i -> Pascal VOC (5-shot)

Decoder Few-Shot Semantic Segmentation +2

Paper
Code

Sparse Teachers Can Be Dense with Knowledge

1 code implementation • 8 Oct 2022 • Yi Yang, Chen Zhang, Dawei Song

Recent advances in distilling pretrained language models have discovered that, besides the expressiveness of knowledge, the student-friendliness should be taken into consideration to realize a truly knowledgable teacher.

Paper
Code

GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models

3 code implementations • 5 Oct 2022 • Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature, class).

Segmentation Semantic Segmentation

255

Paper
Code

Seeing Through the Noisy Dark: Towards Real-world Low-Light Image Enhancement and Denoising

no code implementations • 2 Oct 2022 • Jiahuan Ren, Zhao Zhang, Richang Hong, Mingliang Xu, Yi Yang, Shuicheng Yan

Low-light image enhancement (LLIE) aims at improving the illumination and visibility of dark images with lighting noise.

Attribute Denoising +1

Paper
Add Code

Slimmable Networks for Contrastive Self-supervised Learning

no code implementations • 30 Sep 2022 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

In this work, we present a one-stage solution to obtain pre-trained small models without the need for extra teachers, namely, slimmable networks for contrastive self-supervised learning (\emph{SlimCLR}).

Contrastive Learning Knowledge Distillation +2

Paper
Add Code

Boost CTR Prediction for New Advertisements via Modeling Visual Content

no code implementations • 23 Sep 2022 • Tan Yu, Zhipeng Jin, Jie Liu, Yi Yang, Hongliang Fei, Ping Li

To overcome the limitations of behavior ID features in modeling new ads, we exploit the visual content in ads to boost the performance of CTR prediction models.

Click-Through Rate Prediction Quantization

Paper
Add Code

Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

no code implementations • 19 Sep 2022 • Tan Yu, Jie Liu, Yi Yang, Yi Li, Hongliang Fei, Ping Li

How to pair the video ads with the user search is the core task of Baidu video advertising.

Image Retrieval Retrieval +1

Paper
Add Code

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

no code implementations • 7 Aug 2022 • Lin Li, Long Chen, Hanrong Shi, Wenxiao Wang, Jian Shao, Yi Yang, Jun Xiao

To this end, we propose a novel model-agnostic Label Semantic Knowledge Distillation (LS-KD) for unbiased SGG.

Graph Generation Knowledge Distillation +3

Paper
Add Code

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

1 code implementation • 5 Aug 2022 • Feng Zhu, Zongxin Yang, Xin Yu, Yi Yang, Yunchao Wei

In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way.

Instance Segmentation Semantic Segmentation +1

Paper
Code

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

1 code implementation • 3 Aug 2022 • Xingchen Li, Long Chen, Wenbo Ma, Yi Yang, Jun Xiao

However, we argue that most existing WSSGG works only focus on object-consistency, which means the grounded regions should have the same object category label as text entities.

Graph Generation Object +1

Paper
Code

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

no code implementations • 3 Aug 2022 • Benyuan Sun, Jin Dai, Zihao Liang, Congying Liu, Yi Yang, Bo Bai

SIMT lays the foundation of pre-training with large-scale multi-task multi-domain datasets and is proved essential for stable training in our GPPF experiments.

Multi-Task Learning

Paper
Add Code

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

no code implementations • 27 Jul 2022 • Lin Li, Long Chen, Hanrong Shi, Hanwang Zhang, Yi Yang, Wei Liu, Jun Xiao

To this end, we propose a novel NoIsy label CorrEction and Sample Training strategy for SGG: NICEST.

Graph Generation Knowledge Distillation +1

Paper
Add Code

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

1 code implementation • 26 Jul 2022 • Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang

While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.

Metric Learning Retrieval

Paper
Code

Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets

1 code implementation • 20 Jul 2022 • Yi Yang, Chen Zhang, Benyou Wang, Dawei Song

To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets).

Domain Generalization

Paper
Code

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

1 code implementation • 19 Jul 2022 • Haitian Zeng, Xin Yu, Jiaxu Miao, Yi Yang

We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from Motion (NRSfM).

Paper
Code

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

no code implementations • 8 Jul 2022 • Yucheng Suo, Zhedong Zheng, Xiaohan Wang, Bang Zhang, Yi Yang

We optimize the two losses and keypoint detector network in an end-to-end manner.

Image Animation Text Generation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.