Search Results for author: Rui Zhao

Found 185 papers, 76 papers with code

RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax

1 code implementation • ECCV 2020 • Xiao Zhang, Rui Zhao, Yu Qiao, Hongsheng Li

To address this problem, this paper introduces a novel Radial Basis Function (RBF) distances to replace the commonly used inner products in the softmax loss function, such that it can adaptively assign losses to regularize the intra-class and inter-class distances by reshaping the relative differences, and thus creating more representative prototypes of classes to improve optimization.

Paper
Code

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

no code implementations • 29 May 2024 • Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han

To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.

Paper
Add Code

A Vlogger-augmented Graph Neural Network Model for Micro-video Recommendation

no code implementations • 28 May 2024 • Weijiang Lai, Beihong Jin, Beibei Li, Yiyuan Zheng, Rui Zhao

Moreover, we conduct cross-view contrastive learning to keep the consistency between node embeddings from the two different views.

Paper
Add Code

CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

1 code implementation • 27 May 2024 • Jingqing Ruan, Ziyue Li, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao, Rui Zhao

Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion.

Paper
Code

What Makes Good Few-shot Examples for Vision-Language Models?

no code implementations • 22 May 2024 • Zhaojun Guo, Jinghui Lu, Xuejing Liu, Rui Zhao, Zhenxing Qian, Fei Tan

Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research.

Paper
Add Code

SQL-to-Schema Enhances Schema Linking in Text-to-SQL

no code implementations • 15 May 2024 • Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui Zhao

Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce errors during SQL generation.

Text-To-SQL

Paper
Add Code

Resilient control of networked switched systems subject to deception attack and DoS attack

no code implementations • 10 May 2024 • Rui Zhao, Zhiqiang Zuo, Ying Tan, Yijing Wang, Wentao Zhang

In this paper, the resilient control for switched systems in the presence of deception attack and denial-of-service (DoS) attack is addressed.

Paper
Add Code

Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

1 code implementation • 8 May 2024 • Rui Zhao, Bin Shi, Jianfei Ruan, Tianze Pan, Bo Dong

Utilizing this framework with part-level labels, we can learn the noisy class posteriors more precisely by guiding the model to integrate information from various parts, ultimately improving the classification performance.

Paper
Code

Causal Evaluation of Language Models

1 code implementation • 1 May 2024 • Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning.

Causal Discovery Causal Inference +1

Paper
Code

X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

1 code implementation • 18 Apr 2024 • Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, Jingqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao

The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights.

Paper
Code

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

1 code implementation • 16 Apr 2024 • Hengyuan Zhang, Yanru Wu, Dawei Li, Zacc Yang, Rui Zhao, Yong Jiang, Fei Tan

In an overall evaluation of both speciality and versatility, CoFiTune consistently outperforms baseline methods across diverse tasks and model scales.

Language Modelling Large Language Model

Paper
Code

Sparse Global Matching for Video Frame Interpolation with Large Motion

no code implementations • 10 Apr 2024 • Chunxu Liu, Guozhen Zhang, Rui Zhao, LiMin Wang

Large motion poses a critical challenge in Video Frame Interpolation (VFI) task.

Video Frame Interpolation

Paper
Add Code

Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

1 code implementation • 8 Apr 2024 • Zhengde Zhang, Yiyu Zhang, Haodong Yao, Jianwen Luo, Rui Zhao, Bo Huang, Jiameng Zhao, Yipu Liao, Ke Li, Lina Zhao, Jun Cao, Fazhi Qi, Changzheng Yuan

To address this challenge, a sophisticated large language model system named as Xiwu has been developed, allowing you switch between the most advanced foundation models and quickly teach the model domain knowledge.

Code Generation Language Modelling +1

Paper
Code

SocialGenPod: Privacy-Friendly Generative AI Social Web Applications with Decentralised Personal Data Stores

1 code implementation • 15 Mar 2024 • Vidminas Vizgirda, Rui Zhao, Naman Goel

Unlike centralised Web and data architectures that keep user data tied to application and service providers, we show how one can use Solid -- a decentralised Web specification -- to decouple user data from generative AI applications.

Paper
Code

PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency

1 code implementation • 13 Mar 2024 • Zhishuai Li, Xiang Wang, Jingjing Zhao, Sun Yang, Guoqing Du, Xiaoru Hu, Bin Zhang, Yuxiao Ye, Ziyue Li, Rui Zhao, Hangyu Mao

Then, in the first stage, question-SQL pairs are retrieved as few-shot demonstrations, prompting the LLM to generate a preliminary SQL (PreSQL).

Ranked #1 on Text-To-SQL on spider

In-Context Learning Text-To-SQL

Paper
Code

DragAnything: Motion Control for Anything using Entity Representation

2 code implementations • 12 Mar 2024 • Weijia Wu, Zhuang Li, YuChao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.

Object Video Generation

314

Paper
Code

Perennial Semantic Data Terms of Use for Decentralized Web

1 code implementation • 12 Mar 2024 • Rui Zhao, Jun Zhao

We believe this work demonstrates a practicality of a perennial DToU language and the potential of a paradigm shift to how users interact with data and applications in a decentralized Web, offering both improved privacy and usability.

Navigate

Paper
Code

Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

no code implementations • 5 Mar 2024 • Bin Zhang, Yuxiao Ye, Guoqing Du, Xiaoru Hu, Zhishuai Li, Sun Yang, Chi Harold Liu, Rui Zhao, Ziyue Li, Hangyu Mao

Then we formulate five evaluation tasks to comprehensively assess the performance of diverse methods across various LLMs throughout the Text-to-SQL process. Our study highlights the performance disparities among LLMs and proposes optimal in-context learning solutions tailored to each task.

Benchmarking In-Context Learning +1

Paper
Add Code

Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective

no code implementations • 27 Feb 2024 • Fufangchen Zhao, Guoqiang Jin, Jiaheng Huang, Rui Zhao, Fei Tan

The solution to this problem is often time-consuming and labor-intensive, and there is also an additional cost of secondary deployment, resulting in economic and time losses.

Paper
Add Code

Non-Neighbors Also Matter to Kriging: A New Contrastive-Prototypical Learning

1 code implementation • 23 Jan 2024 • Zhishuai Li, Yunhao Nie, Ziyue Li, Lei Bai, Yisheng Lv, Rui Zhao

As a pre-trained paradigm, we conduct the Kriging task from a new perspective of representation: we aim to first learn robust and general representations and then recover attributes from representations.

Attribute Self-Supervised Learning

Paper
Code

Spatial-Temporal Large Language Model for Traffic Prediction

no code implementations • 18 Jan 2024 • Chenxi Liu, Sun Yang, Qianxiong Xu, Zhishuai Li, Cheng Long, Ziyue Li, Rui Zhao

In this paper, we propose a Spatial-Temporal Large Language Model (ST-LLM) for traffic prediction.

Language Modelling Large Language Model +2

Paper
Add Code

Towards A Better Metric for Text-to-Video Generation

no code implementations • 15 Jan 2024 • Jay Zhangjie Wu, Guian Fang, HaoNing Wu, Xintao Wang, Yixiao Ge, Xiaodong Cun, David Junhao Zhang, Jia-Wei Liu, YuChao Gu, Rui Zhao, Weisi Lin, Wynne Hsu, Ying Shan, Mike Zheng Shou

Experiments on the TVGE dataset demonstrate the superiority of the proposed T2VScore on offering a better metric for text-to-video generation.

Text-to-Video Generation Video Alignment +1

Paper
Add Code

PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

2 code implementations • 26 Dec 2023 • Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, Jiangjin Yin

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL.

Decision Making Offline RL +2

Paper
Code

Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment

1 code implementation • 25 Dec 2023 • Rui Zhao, Liang Zhang, Biao Fu, Cong Hu, Jinsong Su, Yidong Chen

The first KL divergence optimizes the conditional variational autoencoder and regularizes the encoder outputs, while the second KL divergence performs a self-distillation from the posterior path to the prior path, ensuring the consistency of decoder outputs.

Decoder Sign Language Translation +1

Paper
Code

DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared Knowledge

1 code implementation • 22 Dec 2023 • Jiaming Lu, Jingqing Ruan, Haoyuan Jiang, Ziyue Li, Hangyu Mao, Rui Zhao

Furthermore, we implement a scenario-shared Co-Train module to facilitate the learning of generalizable dynamics information across different scenarios.

Decision Making

Paper
Code

VisionTraj: A Noise-Robust Trajectory Recovery Framework based on Large-scale Camera Network

1 code implementation • 11 Dec 2023 • Zhishuai Li, Ziyue Li, Xiaoru Hu, Guoqing Du, Yunhao Nie, Feng Zhu, Lei Bai, Rui Zhao

Trajectory recovery based on the snapshots from the city-wide multi-camera network facilitates urban mobility sensing and driveway optimization.

Clustering Denoising

Paper
Code

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

no code implementations • 4 Dec 2023 • Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou

To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model.

Denoising

Paper
Add Code

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

Ranked #1 on Pedestrian Image Caption on CUHK-PEDES

3D Human Pose Estimation Action Recognition +8

211

Paper
Code

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

no code implementations • 4 Dec 2023 • YuChao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang

In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape.

Video Editing

Paper
Add Code

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

no code implementations • 23 Nov 2023 • Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, Guoliang Fan

The remarkable progress in Large Language Models (LLMs) opens up new avenues for addressing planning and decision-making problems in Multi-Agent Systems (MAS).

Decision Making Hallucination +3

Paper
Add Code

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

no code implementations • 19 Nov 2023 • Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao

Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs.

In-Context Learning Language Modelling +1

Paper
Add Code

To be or not to be? an exploration of continuously controllable prompt engineering

no code implementations • 16 Nov 2023 • Yuhan Sun, Mukai Li, Yixin Cao, Kun Wang, Wenxiao Wang, Xingyu Zeng, Rui Zhao

In response, we introduce ControlPE (Continuously Controllable Prompt Engineering).

Prompt Engineering

Paper
Add Code

What Large Language Models Bring to Text-rich VQA?

no code implementations • 13 Nov 2023 • Xuejing Liu, Wei Tang, Xinzhe Ni, Jinghui Lu, Rui Zhao, Zechao Li, Fei Tan

This pipeline achieved superior performance compared to the majority of existing Multimodal Large Language Models (MLLM) on four text-rich VQA datasets.

Image Comprehension Optical Character Recognition (OCR) +2

Paper
Add Code

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

no code implementations • 5 Nov 2023 • Qianxiong Xu, Cheng Long, Ziyue Li, Sijie Ruan, Rui Zhao, Zhishuai Li

To address this issue, we first present a novel Increment training strategy: instead of masking nodes (and reconstructing them), we add virtual nodes into the training graph so as to mitigate the graph gap issue naturally.

Paper
Add Code

A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery

no code implementations • 5 Nov 2023 • Dedong Li, Ziyue Li, Zhishuai Li, Lei Bai, Qingyuan Gong, Lijun Sun, Wolfgang Ketter, Rui Zhao

Then, we propose a Multi-view Graph and Complexity Aware Transformer (MGCAT) model to encode these semantics in trajectory pre-training from two aspects: 1) adaptively aggregate the multi-view graph features considering trajectory pattern, and 2) higher attention to critical nodes in a complex trajectory.

Paper
Add Code

Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation

no code implementations • 30 Oct 2023 • Vishal Ramesh, Rui Zhao, Naman Goel

Synthetic data is emerging as a promising way to harness the value of data, while reducing privacy risks.

Privacy Preserving Synthetic Data Generation

Paper
Add Code

Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function -- with Real Applications in Traffic Domain

no code implementations • 28 Oct 2023 • Guanghu Sui, Zhishuai Li, Ziyue Li, Sun Yang, Jingqing Ruan, Hangyu Mao, Rui Zhao

Our experiments with Large Language Models (LLMs) illustrate the significant performance improvement on the business dataset and prove the substantial potential of our method.

Language Modelling Large Language Model +1

Paper
Add Code

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

no code implementations • 16 Oct 2023 • Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, YuChao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou

To overcome this, we propose to introduce the dynamic Neural Radiance Fields (NeRF) as the innovative video representation, where the editing can be performed in the 3D spaces and propagated to the entire video via the deformation field.

Style Transfer Super-Resolution +1

Paper
Add Code

MeanAP-Guided Reinforced Active Learning for Object Detection

no code implementations • 12 Oct 2023 • Zhixuan Liang, Xingyu Zeng, Rui Zhao, Ping Luo

Active learning presents a promising avenue for training high-performance models with minimal labeled data, achieved by judiciously selecting the most informative instances to label and incorporating them into the task learner.

Active Object Detection Object +2

Paper
Add Code

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

1 code implementation • 12 Oct 2023 • Rui Zhao, YuChao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou

Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion.

717

Paper
Code

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

1 code implementation • 8 Oct 2023 • Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song

In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e. g., describing object property, category, and relationship).

Language Modelling Large Language Model +4

Paper
Code

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

1 code implementation • 27 Sep 2023 • David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, YuChao Gu, Difei Gao, Mike Zheng Shou

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Ranked #2 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Alignment +1

1,069

Paper
Code

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

no code implementations • 15 Sep 2023 • Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

no code implementations • 14 Sep 2023 • Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong

Attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years.

Automatic Speech Recognition Decoder +3

Paper
Add Code

Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models

no code implementations • 29 Aug 2023 • Lei Han, Qingxu Zhu, Jiapeng Sheng, Chong Zhang, Tingguang Li, Yizheng Zhang, He Zhang, Yuzhen Liu, Cheng Zhou, Rui Zhao, Jie Li, Yufeng Zhang, Rui Wang, Wanchao Chi, Xiong Li, Yonghui Zhu, Lingzhu Xiang, Xiao Teng, Zhengyou Zhang

In this work, we propose a framework for driving legged robots act like real animals with lifelike agility and strategy in complex environments.

TAG

Paper
Add Code

Link-Context Learning for Multimodal LLMs

1 code implementation • 15 Aug 2023 • Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations.

Few-Shot Learning In-Context Learning +1

Paper
Code

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation • NeurIPS 2023 • Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Decoder Depth Estimation +6

288

Paper
Code

Zero-shot Text-driven Physically Interpretable Face Editing

no code implementations • 11 Aug 2023 • Yapeng Meng, Songru Yang, Xu Hu, Rui Zhao, Lincheng Li, Zhenwei Shi, Zhengxia Zou

Our method can also be flexibly extended to real-time video face editing.

Image Manipulation

Paper
Add Code

TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage

no code implementations • 7 Aug 2023 • Jingqing Ruan, Yihong Chen, Bin Zhang, Zhiwei Xu, Tianpeng Bao, Guoqing Du, Shiwei Shi, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao

With recent advancements in natural language processing, Large Language Models (LLMs) have emerged as powerful tools for various real-world applications.

Language Modelling Large Language Model

Paper
Add Code

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

no code implementations • 1 Aug 2023 • Kaijian Liu, Shixiang Tang, Ziyue Li, Zhishuai Li, Lei Bai, Feng Zhu, Rui Zhao

The distribution representation of a clue is a vector consisting of the relation between this clue and all other clues from all modalities, thus being modality agnostic and good for person clustering.

Clustering Relation

Paper
Add Code

Described Object Detection: Liberating Object Detection with Flexible Expressions

2 code implementations • NeurIPS 2023 • Chi Xie, Zhao Zhang, Yixuan Wu, Feng Zhu, Rui Zhao, Shuang Liang

In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object.

Ranked #3 on Described Object Detection on Description Detection Dataset

Binary Classification Described Object Detection +5

130

Paper
Code

Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic

1 code implementation • 27 Jun 2023 • Keqin Chen, Zhao Zhang, Weili Zeng, Richong Zhang, Feng Zhu, Rui Zhao

Referential dialogue is a superset of various vision-language (VL) tasks.

Ranked #10 on Visual Question Answering on ViP-Bench

Image Captioning Referring Expression Segmentation +1

695

Paper
Code

Interaction-Aware Planning With Deep Inverse Reinforcement Learning for Human-Like Autonomous Driving in Merge Scenarios

1 code implementation • journal 2023 • Jiangfeng Nan, Weiwen Deng, Member, IEEE, Ruzheng Zhang, Ying Wang, Rui Zhao, Juan Ding

To consider the interaction factor, the reward function for planning is utilized to evaluate the joint trajectories of the autonomous driving vehicle (ADV) and traffic vehicles.

Autonomous Driving Decision Making

Paper
Code

Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning

no code implementations • 23 Jun 2023 • Shaofeng Zhang, Feng Zhu, Rui Zhao, Junchi Yan

On classification tasks, for ViT-S, ADCLR achieves 77. 5% top-1 accuracy on ImageNet with linear probing, outperforming our baseline (DINO) without our devised techniques as plug-in, by 0. 5%.

Instance Segmentation object-detection +4

Paper
Add Code

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

1 code implementation • 15 Jun 2023 • Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, Hongsheng Li

By fine-tuning CLIP on HPD v2, we obtain Human Preference Score v2 (HPS v2), a scoring model that can more accurately predict human preferences on generated images.

Image Generation

288

Paper
Code

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

1 code implementation • 13 Jun 2023 • Weizhen He, Yiheng Deng, Shixiang Tang, Qihao Chen, Qingsong Xie, Yizhou Wang, Lei Bai, Feng Zhu, Rui Zhao, Wanli Ouyang, Donglian Qi, Yunfeng Yan

This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions.

Person Re-Identification

Paper
Code

Dynamic Causal Graph Convolutional Network for Traffic Prediction

1 code implementation • 12 Jun 2023 • Junpeng Lin, Ziyue Li, Zhishuai Li, Lei Bai, Rui Zhao, Chen Zhang

In this work, we propose a novel approach for traffic prediction that embeds time-varying dynamic Bayesian network to capture the fine spatiotemporal topology of traffic data.

Ranked #13 on Traffic Prediction on METR-LA

Traffic Prediction

Paper
Code

Correlated Time Series Self-Supervised Representation Learning via Spatiotemporal Bootstrapping

1 code implementation • 12 Jun 2023 • Luxuan Wang, Lei Bai, Ziyue Li, Rui Zhao, Fugee Tsung

We evaluated the effectiveness and flexibility of our representation learning framework on correlated time series forecasting and cold-start transferring the forecasting model to new instances with limited data.

Paper
Code

MM-DAG: Multi-task DAG Learning for Multi-modal Data -- with Application for Traffic Congestion Analysis

1 code implementation • 5 Jun 2023 • Tian Lan, Ziyue Li, Zhishuai Li, Lei Bai, Man Li, Fugee Tsung, Wolfgang Ketter, Rui Zhao, Chen Zhang

This encourages the multi-task design: with each DAG as a task, the MM-DAG tries to learn the multiple DAGs jointly so that their consensus and consistency are maximized.

Paper
Code

Balancing Logit Variation for Long-tailed Semantic Segmentation

1 code implementation • CVPR 2023 • Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen

In this way, we manage to close the gap between the feature areas of different categories, resulting in a more balanced representation.

Semantic Segmentation

Paper
Code

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

2 code implementations • NeurIPS 2023 • YuChao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community.

Attribute

366

Paper
Code

Deeply Coupled Cross-Modal Prompt Learning

1 code implementation • 29 May 2023 • Xuejing Liu, Wei Tang, Jinghui Lu, Rui Zhao, Zhaojun Guo, Fei Tan

Recent advancements in multimodal foundation models (e. g., CLIP) have excelled in zero-shot generalization.

Domain Adaptation Few-Shot Learning +3

Paper
Code

ZeroPose: CAD-Model-based Zero-Shot Pose Estimation

no code implementations • 29 May 2023 • Jianqiu Chen, Mingshan Sun, Tianpeng Bao, Rui Zhao, Liwei Wu, Zhenyu He

In this paper, we present a CAD model-based zero-shot pose estimation pipeline called ZeroPose.

Instance Segmentation Object +3

Paper
Add Code

Advancing Referring Expression Segmentation Beyond Single Image

1 code implementation • ICCV 2023 • Yixuan Wu, Zhao Zhang, Xie Chi, Feng Zhu, Rui Zhao

To overcome this limitation, we propose a more realistic and general setting, named Group-wise Referring Expression Segmentation (GRES), which expands RES to a collection of related images, allowing the described objects to be present in a subset of input images.

Co-Salient Object Detection Object +4

Paper
Code

Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems

no code implementations • 13 May 2023 • Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, Guoliang Fan

Our research contributes to the development of an effective and adaptable asynchronous action coordination method that can be widely applied to various task types and environmental configurations in MAS.

Decision Making Multi-agent Reinforcement Learning

Paper
Add Code

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

1 code implementation • ICCV 2023 • Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li

To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel.

248

Paper
Code

WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal

no code implementations • 24 Mar 2023 • Yulin Luo, Rui Zhao, Xiaobao Wei, Jinwei Chen, Yijie Lu, Shenghao Xie, Tianyu Wang, Ruiqin Xiong, Ming Lu, Shanghang Zhang

To this end, we propose a method called Weather-aware Multi-scale MoE (WM-MoE) based on Transformer for blind weather removal.

Autonomous Driving Contrastive Learning +1

Paper
Add Code

Explore the Power of Synthetic Data on Few-shot Object Detection

no code implementations • 23 Mar 2023 • Shaobo Lin, Kun Wang, Xingyu Zeng, Rui Zhao

To construct a representative synthetic training dataset, we maximize the diversity of the selected images via a sample-based and cluster-based method.

Few-Shot Object Detection Object +3

Paper
Add Code

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

1 code implementation • CVPR 2023 • Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li

To overcome these obstacles, we propose CORA, a DETR-style framework that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching.

Ranked #6 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Described Object Detection object-detection +2

156

Paper
Code

SpikeCV: Open a Continuous Computer Vision Era

1 code implementation • 21 Mar 2023 • Yajing Zheng, Jiyuan Zhang, Rui Zhao, Jianhao Ding, Shiyan Chen, Ruiqin Xiong, Zhaofei Yu, Tiejun Huang

SpikeCV focuses on encapsulation for spike data, standardization for dataset interfaces, modularization for vision tasks, and real-time applications for challenging scenes.

Paper
Code

SeqCo-DETR: Sequence Consistency Training for Self-Supervised Object Detection with Transformers

no code implementations • 15 Mar 2023 • Guoqiang Jin, Fan Yang, Mingshan Sun, Ruyi Zhao, Yakun Liu, Wei Li, Tianpeng Bao, Liwei Wu, Xingyu Zeng, Rui Zhao

To this end, we propose SeqCo-DETR, a novel Sequence Consistency-based self-supervised method for object DEtection with TRansformers.

Object object-detection +2

Paper
Add Code

HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

1 code implementation • CVPR 2023 • Shixiang Tang, Cheng Chen, Qingsong Xie, Meilin Chen, Yizhou Wang, Yuanzheng Ci, Lei Bai, Feng Zhu, Haiyang Yang, Li Yi, Rui Zhao, Wanli Ouyang

Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.

Ranked #1 on Pedestrian Attribute Recognition on PA-100K (using extra training data)

Attribute Autonomous Driving +5

211

Paper
Code

UniHCP: A Unified Model for Human-Centric Perceptions

1 code implementation • CVPR 2023 • Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang

When adapted to a specific task, UniHCP achieves new SOTAs on a wide range of human-centric tasks, e. g., 69. 8 mIoU on CIHP for human parsing, 86. 18 mA on PA-100K for attribute prediction, 90. 3 mAP on Market1501 for ReID, and 85. 8 JI on CrowdHuman for pedestrian detection, performing better than specialized models tailored for each task.

Ranked #1 on Pose Estimation on MS-COCO

2D Pose Estimation Attribute +8

139

Paper
Code

Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation

no code implementations • CVPR 2023 • Rui Zhao, Wei Li, Zhipeng Hu, Lincheng Li, Zhengxia Zou, Zhenwei Shi, Changjie Fan

In our method, taking the power of large-scale pre-trained multi-modal CLIP and neural rendering, T2P searches both continuous facial parameters and discrete facial parameters in a unified framework.

3D Generation Face Model +3

Paper
Add Code

An Effective Crop-Paste Pipeline for Few-shot Object Detection

no code implementations • 28 Feb 2023 • Shaobo Lin, Kun Wang, Xingyu Zeng, Rui Zhao

Specifically, we first discover the base images which contain the FP of novel categories and select a certain amount of samples from them for the base and novel categories balance.

Data Augmentation Few-Shot Object Detection +1

Paper
Add Code

Efficient Masked Autoencoders with Self-Consistency

no code implementations • 28 Feb 2023 • Zhaowen Li, Yousong Zhu, Zhiyang Chen, Wei Li, Chaoyang Zhao, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

However, its high random mask ratio would result in two serious problems: 1) the data are not efficiently exploited, which brings inefficient pre-training (\eg, 1600 epochs for MAE $vs.$ 300 epochs for the supervised), and 2) the high uncertainty and inconsistency of the pre-trained model, \ie, the prediction of the same patch may be inconsistent under different mask rounds.

Language Modelling Masked Language Modeling +3

Paper
Add Code

Saliency Guided Contrastive Learning on Scene Images

no code implementations • 22 Feb 2023 • Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Haiyang Yang, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

Despite being feasible, recent works largely overlooked discovering the most discriminative regions for contrastive learning to object representations in scene images.

Contrastive Learning Linear evaluation +2

Paper
Add Code

Explore the Power of Dropout on Few-shot Learning

no code implementations • 26 Jan 2023 • Shaobo Lin, Xingyu Zeng, Rui Zhao

The generalization power of the pre-trained model is the key for few-shot deep learning.

Few-Shot Image Classification Few-Shot Learning +2

Paper
Add Code

SparseMAE: Sparse Training Meets Masked Autoencoders

no code implementations • ICCV 2023 • Aojun Zhou, Yang Li, Zipeng Qin, Jianbo Liu, Junting Pan, Renrui Zhang, Rui Zhao, Peng Gao, Hongsheng Li

In this paper, we aim to reduce model complexity from large vision transformers pretrained by MAE with assistant of sparse training.

Paper
Add Code

Transformer in Transformer as Backbone for Deep Reinforcement Learning

1 code implementation • 30 Dec 2022 • Hangyu Mao, Rui Zhao, Hao Chen, Jianye Hao, Yiqun Chen, Dong Li, Junge Zhang, Zhen Xiao

Recent methods combine the Transformer with these modules for better performance.

Decision Making reinforcement-learning +1

Paper
Code

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

no code implementations • 5 Dec 2022 • Rui Zhao, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li

Neural transducer is now the most popular end-to-end model for speech recognition, due to its naturally streaming ability.

Language Modelling speech-recognition +1

Paper
Add Code

Exploring Stochastic Autoregressive Image Modeling for Visual Representation

1 code implementation • 3 Dec 2022 • Yu Qi, Fan Yang, Yousong Zhu, Yufei Liu, Liwei Wu, Rui Zhao, Wei Li

By introducing stochastic prediction and the parallel encoder-decoder, SAIM significantly improve the performance of autoregressive image modeling.

Decoder Self-Supervised Learning

Paper
Code

PUnifiedNER: A Prompting-based Unified NER System for Diverse Datasets

1 code implementation • 27 Nov 2022 • Jinghui Lu, Rui Zhao, Brian Mac Namee, Fei Tan

In this work, we present a ``versatile'' model -- the Prompting-based Unified NER system (PUnifiedNER) -- that works with data from different domains and can recognise up to 37 entity types simultaneously, and theoretically it could be as many as possible.

named-entity-recognition Named Entity Recognition +1

Paper
Code

MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

no code implementations • 25 Nov 2022 • Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, Ye Zheng

However, existing datasets for unsupervised anomaly detection are biased towards manufacturing inspection, not considering maintenance inspection which is usually conducted under outdoor uncontrolled environment such as varying camera viewpoints, messy background and degradation of object surface after long-term working.

Unsupervised Anomaly Detection

Paper
Add Code

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

no code implementations • 17 Nov 2022 • Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

This motivates us to leverage the factorized neural transducer structure, containing a real language model, the vocabulary predictor.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Geo6D: Geometric Constraints Learning for 6D Pose Estimation

no code implementations • 20 Oct 2022 • Jianqiu Chen, Mingshan Sun, Ye Zheng, Tianpeng Bao, Zhenyu He, Donghai Li, Guoqiang Jin, Rui Zhao, Liwei Wu, Xiaoke Jiang

Numerous 6D pose estimation methods have been proposed that employ end-to-end regression to directly estimate the target pose parameters.

6D Pose Estimation object-detection +3

Paper
Add Code

A Unified Framework with Meta-dropout for Few-shot Learning

no code implementations • 12 Oct 2022 • Shaobo Lin, Xingyu Zeng, Rui Zhao

Conventional training of deep neural networks usually requires a substantial amount of data with expensive human annotations.

Few-Shot Image Classification Few-Shot Learning +2

Paper
Add Code

SDA: Simple Discrete Augmentation for Contrastive Sentence Representation Learning

1 code implementation • 8 Oct 2022 • Dongsheng Zhu, Zhenyu Mao, Jinghui Lu, Rui Zhao, Fei Tan

Contrastive learning has recently achieved compelling performance in unsupervised sentence representation.

Contrastive Learning Data Augmentation +4

Paper
Code

What Makes Pre-trained Language Models Better Zero-shot Learners?

1 code implementation • 30 Sep 2022 • Jinghui Lu, Dongsheng Zhu, Weidong Han, Rui Zhao, Brian Mac Namee, Fei Tan

Current methods for prompt learning in zeroshot scenarios widely rely on a development set with sufficient human-annotated data to select the best-performing prompt template a posteriori.

Language Modelling text-classification +2

Paper
Code

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

2 code implementations • 28 Sep 2022 • Zhiyang Chen, Yousong Zhu, Zhaowen Li, Fan Yang, Wei Li, Haixin Wang, Chaoyang Zhao, Liwei Wu, Rui Zhao, Jinqiao Wang, Ming Tang

Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks.

Multi-Label Classification Object +2

Paper
Code

Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

1 code implementation • 15 Sep 2022 • Ye Du, Yujun Shen, Haochen Wang, Jingjing Fei, Wei Li, Liwei Wu, Rui Zhao, Zehua Fu, Qingjie Liu

Self-training has shown great potential in semi-supervised learning.

Pseudo Label Semi-Supervised Semantic Segmentation +1

Paper
Code

Jointly Contrastive Representation Learning on Road Network and Trajectory

1 code implementation • 14 Sep 2022 • Zhenyu Mao, Ziyue Li, Dedong Li, Lei Bai, Rui Zhao

Unlike the existing cross-scale contrastive learning methods on graphs that only contrast a graph and its belonging nodes, the contrast between road segment and trajectory is elaborately tailored via novel positive sampling and adaptive weighting strategies.

Contrastive Learning Representation Learning +1

Paper
Code

Uni6Dv2: Noise Elimination for 6D Pose Estimation

no code implementations • 15 Aug 2022 • Mingshan Sun, Ye Zheng, Tianpeng Bao, Jianqiu Chen, Guoqiang Jin, Liwei Wu, Rui Zhao, Xiaoke Jiang

Uni6D is the first 6D pose estimation approach to employ a unified backbone network to extract features from both RGB and depth images.

6D Pose Estimation Denoising +2

Paper
Add Code

Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

no code implementations • 1 Aug 2022 • Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task.

counterfactual Person Re-Identification

Paper
Add Code

Auto-Encoding Adversarial Imitation Learning

no code implementations • 22 Jun 2022 • Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao

In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains

no code implementations • 10 May 2022 • Haiyang Yang, Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Wanli Ouyang

While recent self-supervised learning methods have achieved good performances with evaluation set on the same domain as the training set, they will have an undesirable performance decrease when tested on a different domain.

Self-Supervised Learning

Paper
Add Code

DOTIN: Dropping Task-Irrelevant Nodes for GNNs

no code implementations • 28 Apr 2022 • Shaofeng Zhang, Feng Zhu, Junchi Yan, Rui Zhao, Xiaokang Yang

Scalability is an important consideration for deep graph neural networks.

Graph Classification Graph Learning

Paper
Add Code

Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

no code implementations • CVPR 2022 • Xiaoke Jiang, Donghai Li, Hao Chen, Ye Zheng, Rui Zhao, Liwei Wu

They use a 2D CNN for RGB images and a per-pixel point cloud network for depth data, as well as a fusion network for feature fusion.

6D Pose Estimation

Paper
Add Code

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

no code implementations • CVPR 2022 • Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2. 5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.

Image Classification Object +4

Paper
Add Code

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

1 code implementation • CVPR 2022 • Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le

A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability.

Ranked #3 on Semi-Supervised Semantic Segmentation on PASCAL VOC 2012 50%

Semi-Supervised Semantic Segmentation

414

Paper
Code

Align Representations With Base: A New Approach to Self-Supervised Learning

no code implementations • CVPR 2022 • Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang

Existing symmetric contrastive learning methods suffer from collapses (complete and dimensional) or quadratic complexity of objectives.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

2 code implementations • 22 Dec 2021 • Rui Zhao, Jinming Song, Yufeng Yuan, Hu Haifeng, Yang Gao, Yi Wu, Zhongqian Sun, Yang Wei

We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using any human data.

Reinforcement Learning (RL)

Paper
Code

Feature Erasing and Diffusion Network for Occluded Person Re-Identification

1 code implementation • CVPR 2022 • Zhikang Wang, Feng Zhu, Shixiang Tang, Rui Zhao, Lihuo He, Jiangning Song

With the guidance of the occlusion scores from OEM, the feature diffusion process is mainly conducted on visible body parts, which guarantees the quality of the synthesized NTP characteristics.

Ranked #1 on Person Re-Identification on Occluded REID (Rank-1 metric)

Person Re-Identification

Paper
Code

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

no code implementations • CVPR 2022 • Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

The pretrain-finetune paradigm is a classical pipeline in visual learning.

domain classification Linear evaluation +3

Paper
Add Code

FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows

5 code implementations • 15 Nov 2021 • Jiawei Yu, Ye Zheng, Xiang Wang, Wei Li, Yushuang Wu, Rui Zhao, Liwei Wu

However, current methods can not effectively map image features to a tractable base distribution and ignore the relationship between local and global features which are important to identify anomalies.

Ranked #20 on Anomaly Detection on MVTec AD

Unsupervised Anomaly Detection Weakly Supervised Defect Detection

2,789

Paper
Code

Boundary Distribution Estimation for Precise Object Detection

no code implementations • 2 Nov 2021 • Peng Zhi, Haoran Zhou, Hang Huang, Rui Zhao, Rui Zhou, Qingguo Zhou

In the field of state-of-the-art object detection, the task of object localization is typically accomplished through a dedicated subnet that emphasizes bounding box regression.

Object object-detection +4

Paper
Add Code

Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

no code implementations • 9 Oct 2021 • Ye Zheng, Xiang Wang, Rui Deng, Tianpeng Bao, Rui Zhao, Liwei Wu

To facilitate the learning with only normal images, we propose a new pretext task called non-contrastive learning for the fine alignment stage.

Ranked #49 on Anomaly Detection on MVTec AD

Contrastive Learning Unsupervised Anomaly Detection

Paper
Add Code

Optical Flow Estimation for Spiking Camera

1 code implementation • CVPR 2022 • Liwen Hu, Rui Zhao, Ziluo Ding, Lei Ma, Boxin Shi, Ruiqin Xiong, Tiejun Huang

Further, for training SCFlow, we synthesize two sets of optical flow data for the spiking camera, SPIkingly Flying Things and Photo-realistic High-speed Motion, denoted as SPIFT and PHM respectively, corresponding to random high-speed and well-designed scenes.

Event-based vision Motion Estimation +1

Paper
Code

Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

no code implementations • 3 Oct 2021 • Rui Zhao, Malcolm Atkinson, Petros Papapanagiotou, Federica Magnoni, Jacques Fleuriot

It depends on federations sharing data that often have governance rules or external regulations restricting their use.

Paper
Add Code

MDFL: A UNIFIED FRAMEWORK WITH META-DROPOUT FOR FEW-SHOT LEARNING

no code implementations • 29 Sep 2021 • Shaobo Lin, Xingyu Zeng, Rui Zhao

Conventional training of deep neural networks usually requires a substantial amount of data with expensive human annotations.

Few-Shot Image Classification Few-Shot Learning +2

Paper
Add Code

Improving the Transferability of Supervised Pretraining with an MLP Projector

no code implementations • 29 Sep 2021 • Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

The pretrain-finetune paradigm is a classical pipeline in visual learning.

domain classification

Paper
Add Code

Zero-CL: Instance and Feature decorrelation for negative-free symmetric contrastive learning

no code implementations • ICLR 2022 • Shaofeng Zhang, Feng Zhu, Junchi Yan, Rui Zhao, Xiaokang Yang

The proposed two methods (FCL, ICL) can be combined synthetically, called Zero-CL, where ``Zero'' means negative samples are \textbf{zero} relevant, which allows Zero-CL to completely discard negative pairs i. e., with \textbf{zero} negative samples.

Contrastive Learning

Paper
Add Code

Auto-Encoding Inverse Reinforcement Learning

no code implementations • 29 Sep 2021 • Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao

Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Multi-Source Video Domain Adaptation with Temporal Attentive Moment Alignment

no code implementations • 21 Sep 2021 • Yuecong Xu, Jianfei Yang, Haozhi Cao, Keyu Wu, Min Wu, Rui Zhao, Zhenghua Chen

Multi-Source Domain Adaptation (MSDA) is a more practical domain adaptation scenario in real-world scenarios.

Unsupervised Domain Adaptation

Paper
Add Code

Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

1 code implementation • 10 Sep 2021 • Ziluo Ding, Rui Zhao, Jiyuan Zhang, Tianxiao Gao, Ruiqin Xiong, Zhaofei Yu, Tiejun Huang

Recently, many deep learning methods have shown great success in providing promising solutions to many event-based problems, such as optical flow estimation.

Event-based Optical Flow Optical Flow Estimation +1

Paper
Code

An Automated Framework for Supporting Data-Governance Rule Compliance in Decentralized MIMO Contexts

no code implementations • 2 Sep 2021 • Rui Zhao

We propose Dr. Aid, a logic-based AI framework for automated compliance checking of data governance rules over data-flow graphs.

Paper
Add Code

MST: Masked Self-Supervised Transformer for Visual Representation

no code implementations • NeurIPS 2021 • Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.

Language Modelling Linear evaluation +4

Paper
Add Code

Improving Facial Attribute Recognition by Group and Graph Learning

no code implementations • 28 May 2021 • Zhenghao Chen, Shuhang Gu, Feng Zhu, Jing Xu, Rui Zhao

For the spatial correlation, we aggregate attributes with spatial similarity into a part-based group and then introduce a Group Attention Learning to generate the group attention and the part-based group feature.

Attribute Graph Learning

Paper
Add Code

Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification

no code implementations • 26 May 2021 • Shijie Yu, Feng Zhu, Dapeng Chen, Rui Zhao, Haobin Chen, Shixiang Tang, Jinguo Zhu, Yu Qiao

In UDCL, a universal expert supervises the learning of domain experts and continuously gathers knowledge from all domain experts.

Domain Generalization Meta-Learning +1

Paper
Add Code

Neighbourhood-guided Feature Reconstruction for Occluded Person Re-Identification

no code implementations • 16 May 2021 • Shijie Yu, Dapeng Chen, Rui Zhao, Haobin Chen, Yu Qiao

Person images captured by surveillance cameras are often occluded by various obstacles, which lead to defective feature representation and harm person re-identification (Re-ID) performance.

Person Re-Identification

Paper
Add Code

Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification

no code implementations • 27 Apr 2021 • Yixiao Ge, Xiao Zhang, Ching Lam Choi, Ka Chun Cheung, Peipei Zhao, Feng Zhu, Xiaogang Wang, Rui Zhao, Hongsheng Li

In this way, our BAKE framework achieves online knowledge ensembling across multiple samples with only a single network.

Classification General Classification +1

Paper
Add Code

On Addressing Practical Challenges for RNN-Transducer

no code implementations • 27 Apr 2021 • Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.

speech-recognition Speech Recognition

Paper
Add Code

Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval

no code implementations • 29 Mar 2021 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo

The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.

Retrieval Text Retrieval +1

Paper
Add Code

Mutual Information State Intrinsic Control

2 code implementations • ICLR 2021 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu

Reinforcement learning has been shown to be highly successful at many challenging tasks.

Paper
Code

Progressive Correspondence Pruning by Consensus Learning

1 code implementation • ICCV 2021 • Chen Zhao, Yixiao Ge, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann

Correspondence selection aims to correctly select the consistent matches (inliers) from an initial set of putative correspondences.

Denoising Pose Estimation +1

Paper
Code

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

The Vulnerability of the Neural Networks Against Adversarial Examples in Deep Learning Algorithms

no code implementations • 2 Nov 2020 • Rui Zhao

Based on current security threats faced by deep learning, this paper introduces the problem of adversarial examples in deep learning, sorts out the existing attack and defense methods of the black box and white box, and classifies them.

Paper
Add Code

Enhancing and Learning Denoiser without Clean Reference

no code implementations • 9 Sep 2020 • Rui Zhao, Daniel P. K. Lun, Kin-Man Lam

Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks.

Image Denoising

Paper
Add Code

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

no code implementations • 12 Aug 2020 • Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li

Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

no code implementations • 11 Aug 2020 • Yongchao Liu, Yue Jin, Yong Chen, Teng Teng, Hang Ou, Rui Zhao, Yao Zhang

Accelerating deep model training and inference is crucial in practice.

Paper
Add Code

Deep Reinforcement Learning Based Mobile Edge Computing for Intelligent Internet of Things

no code implementations • 1 Aug 2020 • Rui Zhao, Xinjie Wang, Junjuan Xia, Liseng Fan

In particular, the system cost of latency and energy consumption can be reduced significantly by the proposed deep reinforcement learning based algorithm.

Edge-computing reinforcement-learning +1

Paper
Add Code

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

no code implementations • 30 Jul 2020 • Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong

Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Deep Multi-task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

no code implementations • 9 Jul 2020 • Rui Zhao, Tianshan Liu, Jun Xiao, Daniel P. K. Lun, Kin-Man Lam

Multi-task learning is an effective learning strategy for deep-learning-based facial expression recognition tasks.

Facial Expression Recognition Facial Expression Recognition (FER) +2

Paper
Add Code

Learning Individualized Treatment Rules with Estimated Translated Inverse Propensity Score

1 code implementation • 2 Jul 2020 • Zhiliang Wu, Yinchong Yang, Yunpu Ma, Yushan Liu, Rui Zhao, Michael Moor, Volker Tresp

Randomized controlled trials typically analyze the effectiveness of treatments with the goal of making treatment recommendations for patient subgroups.

Paper
Code

Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis

no code implementations • 28 Jun 2020 • Rui Zhao, Kin-Man Lam, Daniel P. K. Lun

Since most of the content or energy of natural images resides in the low-frequency spectrum, their transformed coefficients in the frequency domain are highly imbalanced.

Image Denoising

Paper
Add Code

Continual Representation Learning for Biometric Identification

1 code implementation • 8 Jun 2020 • Bo Zhao, Shixiang Tang, Dapeng Chen, Hakan Bilen, Rui Zhao

With the explosion of digital data in recent years, continuously learning new tasks from a stream of data without forgetting previously acquired knowledge has become increasingly important.

Continual Learning Knowledge Distillation +1

Paper
Code

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

3 code implementations • ECCV 2020 • Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li

The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset.

Image Retrieval Retrieval

267

Paper
Code

Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

3 code implementations • NeurIPS 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li

To solve these problems, we propose a novel self-paced contrastive learning framework with hybrid memory.

Ranked #3 on Unsupervised Domain Adaptation on Market to MSMT

Clustering Contrastive Learning +4

391

Paper
Code

Bayesian Adversarial Human Motion Synthesis

1 code implementation • CVPR 2020 • Rui Zhao, Hui Su, Qiang Ji

By explicitly capturing the distribution of the data and parameters, our model has a more compact parameterization compared to GAN-based generative models.

Bayesian Inference Data Augmentation +1

Paper
Code

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

1 code implementation • 28 May 2020 • Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu

Among all three E2E models, transformer-AED achieved the best accuracy in both streaming and non-streaming mode.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

268

Paper
Code

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

no code implementations • CVPR 2020 • Shijie Yu, Shihua Li, Dapeng Chen, Rui Zhao, Junjie Yan, Yu Qiao

To address the clothes changing person re-id problem, we construct a novel large-scale re-id benchmark named ClOthes ChAnging Person Set (COCAS), which provides multiple images of the same identity with different clothes.

Person Re-Identification

Paper
Add Code

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

no code implementations • 1 May 2020 • Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong

Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Stacked Convolutional Deep Encoding Network for Video-Text Retrieval

no code implementations • 10 Apr 2020 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha

Existing dominant approaches for cross-modal video-text retrieval task are to learn a joint embedding space to measure the cross-modal similarity.

Language Modelling Retrieval +2

Paper
Add Code

Learning to Cluster Faces via Confidence and Connectivity Estimation

3 code implementations • CVPR 2020 • Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin

With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters.

Clustering Connectivity Estimation +2

702

Paper
Code

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

no code implementations • 17 Mar 2020 • Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong

While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID

3 code implementations • 14 Mar 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li

To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.

Ranked #4 on Unsupervised Domain Adaptation on Market to MSMT

Pseudo Label Relation +3

Paper
Code

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

no code implementations • 5 Feb 2020 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu

In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Towards a computer-interpretable actionable formal model to encode data governance rules

no code implementations • 19 Nov 2019 • Rui Zhao, Malcolm Atkinson

With the needs of science and business, data sharing and re-use has become an intensive activity for various areas.

Paper
Add Code

Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition

no code implementations • ICCV 2019 • Rui Zhao, Kang Wang, Hui Su, Qiang Ji

Finally, the whole model is extended under the Bayesian framework to a probabilistic model in order to better capture the stochasticity and variation in the data.

Ranked #91 on Skeleton Based Action Recognition on NTU RGB+D

Action Recognition Anatomy +2

Paper
Add Code

Improving RNN Transducer Modeling for End-to-End Speech Recognition

1 code implementation • 26 Sep 2019 • Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong

In this paper, we improve the RNN-T training in two aspects.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Self-Supervised State-Control through Intrinsic Mutual Information Rewards

1 code implementation • 25 Sep 2019 • Rui Zhao, Volker Tresp, Wei Xu

Our results show that the mutual information between the context states and the states of interest can be an effective ingredient for overcoming challenges in robotic manipulation tasks with sparse rewards.

OpenAI Gym reinforcement-learning +1

Paper
Code

Memory-Based Neighbourhood Embedding for Visual Recognition

no code implementations • ICCV 2019 • Suichan Li, Dapeng Chen, Bin Liu, Nenghai Yu, Rui Zhao

Learning discriminative image feature embeddings is of great importance to visual recognition.

Few-Shot Learning Image Retrieval

Paper
Add Code

Bayesian Hierarchical Dynamic Model for Human Action Recognition

1 code implementation • CVPR 2019 • Rui Zhao, Wanru Xu, Hui Su, Qiang Ji

Human action recognition remains as a challenging task partially due to the presence of large variations in the execution of action.

Ranked #3 on Skeleton Based Action Recognition on MSR Action3D

Action Recognition Bayesian Inference +3

Paper
Code

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

3 code implementations • 21 May 2019 • Rui Zhao, Xudong Sun, Volker Tresp

This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals.

Multi-Goal Reinforcement Learning OpenAI Gym +2

Paper
Code

P2SGrad: Refined Gradients for Optimizing Deep Face Models

no code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Cosine-based softmax losses significantly improve the performance of deep face recognition networks.

Face Recognition

Paper
Add Code

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations

5 code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy.

Ranked #6 on Face Verification on MegaFace

Face Recognition Face Verification

219

Paper
Code

Neural Networks for Modeling Source Code Edits

no code implementations • 4 Apr 2019 • Rui Zhao, David Bieber, Kevin Swersky, Daniel Tarlow

In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files.

Paper
Add Code

Curiosity-Driven Experience Prioritization via Density Estimation

no code implementations • 20 Feb 2019 • Rui Zhao, Volker Tresp

In Reinforcement Learning (RL), an agent explores the environment and collects trajectories into the memory buffer for later learning.

Density Estimation OpenAI Gym +3

Paper
Add Code

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

no code implementations • 31 Dec 2018 • Amit Das, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong

In particular, we introduce Attention CTC, Self-Attention CTC, Hybrid CTC, and Mixed-unit CTC.

Decoder Language Modelling

Paper
Add Code

Efficient Dialog Policy Learning via Positive Memory Retention

2 code implementations • 2 Oct 2018 • Rui Zhao, Volker Tresp

This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning.

Goal-Oriented Dialog Object Discovery +1

Paper
Code

Energy-Based Hindsight Experience Prioritization

2 code implementations • 2 Oct 2018 • Rui Zhao, Volker Tresp

We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

1 code implementation • 2 Jul 2018 • Rui Zhao, Volker Tresp

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.

Policy Gradient Methods Reinforcement Learning (RL) +1

Paper
Code

A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation

no code implementations • CVPR 2018 • Kang Wang, Rui Zhao, Qiang Ji

Through a top-down inference, the HGM can synthesize eye images consistent with the given eye gaze.

Gaze Estimation Generative Adversarial Network +1

Paper
Add Code

Bilateral Ordinal Relevance Multi-Instance Regression for Facial Action Unit Intensity Estimation

no code implementations • CVPR 2018 • Yong Zhang, Rui Zhao, Wei-Ming Dong, Bao-Gang Hu, Qiang Ji

The majority of methods directly apply supervised learning techniques to AU intensity estimation while few methods exploit unlabeled samples to improve the performance.

regression

Paper
Add Code

Attention-Aware Compositional Network for Person Re-identification

no code implementations • CVPR 2018 • Jing Xu, Rui Zhao, Feng Zhu, Huaming Wang, Wanli Ouyang

AACN consists of two main components: Pose-guided Part Attention (PPA) and Attention-aware Feature Composition (AFC).

Person Re-Identification Pose Estimation

Paper
Add Code

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

15 code implementations • ICLR 2018 • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le

On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models.

Ranked #27 on Question Answering on SQuAD1.1 dev

Machine Translation Question Answering +2

120

Paper
Code

Developing Far-Field Speaker System Via Teacher-Student Learning

no code implementations • 14 Apr 2018 • Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Paper
Add Code

Advancing Acoustic-to-Word CTC Model

no code implementations • 15 Mar 2018 • Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong

However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Decoder Language Modelling

Paper
Add Code

Advancing Connectionist Temporal Classification With Attention Modeling

no code implementations • 15 Mar 2018 • Amit Das, Jinyu Li, Rui Zhao, Yifan Gong

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework.

Classification General Classification +3

Paper
Add Code

Neural Program Synthesis with Priority Queue Training

4 code implementations • 10 Jan 2018 • Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le

Models and examples built with TensorFlow

Program Synthesis

76,672

Paper
Code

Acoustic-To-Word Model Without OOV

no code implementations • 28 Nov 2017 • Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Paper
Add Code

Improved training for online end-to-end speech recognition systems

1 code implementation • 6 Nov 2017 • Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao

Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training.

speech-recognition Speech Recognition

Paper
Code

Large-Scale Domain Adaptation via Teacher-Student Learning

no code implementations • 17 Aug 2017 • Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong

High accuracy speech recognition requires a large amount of transcribed data for supervised training.

Domain Adaptation speech-recognition +1

Paper
Add Code

A Nuclear-norm Model for Multi-Frame Super-Resolution Reconstruction from Video Clips

no code implementations • 17 Apr 2017 • Rui Zhao, Raymond H. Chan

Then a low-rank model is used to construct the reference frame in high-resolution by incorporating the information of the low-resolution frames.

Multi-Frame Super-Resolution Optical Flow Estimation

Paper
Add Code

Two-Stream RNN/CNN for Action Recognition in 3D Videos

no code implementations • 22 Mar 2017 • Rui Zhao, Haider Ali, Patrick van der Smagt

The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Automated Low-cost Terrestrial Laser Scanner for Measuring Diameters at Breast Height and Heights of Forest Trees

no code implementations • 8 Feb 2017 • Pei Wang, Guochao Bu, Ronghao Li, Rui Zhao

The new scanner was named as BEE, which can scan the forest trees in three dimension.

Position

Paper
Add Code

Deep Learning and Its Applications to Machine Health Monitoring: A Survey

1 code implementation • 16 Dec 2016 • Rui Zhao, Ruqiang Yan, Zhenghua Chen, Kezhi Mao, Peng Wang, Robert X. Gao

Since 2006, deep learning (DL) has become a rapidly growing research direction, redefining state-of-the-art performances in a wide range of areas such as object recognition, image segmentation, speech recognition and machine translation.

Image Segmentation Machine Translation +5

Paper
Code

Facial Expression Intensity Estimation Using Ordinal Information

no code implementations • CVPR 2016 • Rui Zhao, Quan Gan, Shangfei Wang, Qiang Ji

In fully supervised case, all the frames are provided with intensity annotations.

Paper
Add Code

Saliency Detection by Multi-Context Deep Learning

no code implementations • CVPR 2015 • Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.

Image Classification object-detection +3

Paper
Add Code

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

no code implementations • 15 Dec 2014 • Hongsheng Li, Rui Zhao, Xiaogang Wang

The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels.

Classification General Classification +5

Paper
Add Code

Person Re-identification by Saliency Learning

no code implementations • 5 Dec 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang

(3) saliency matching is proposed based on patch matching.

Patch Matching Person Re-Identification

Paper
Add Code

Nilpotent matrices having a given Jordan type as maximum commuting nilpotent orbit

1 code implementation • 8 Sep 2014 • Anthony Iarrobino, Leila Khatami, Bart Van Steirteghem, Rui Zhao

In 2012 P. Oblak formulated a conjecture concerning the cardinality of the set of partitions $P$ such that ${\mathcal Q}(P)$ is a given stable partition $ Q$ with two parts, and proved some special cases.

Rings and Algebras Commutative Algebra Representation Theory 15A27 (Primary), 05E40 (Secondary), 13E10, 15A21

Paper
Code

Learning Mid-level Filters for Person Re-identification

no code implementations • CVPR 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification.

Clustering Patch Matching +1

Paper
Add Code

DeepReID: Deep Filter Pairing Neural Network for Person Re-Identification

no code implementations • CVPR 2014 • Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang

In this paper, we propose a novel filter pairing neural network (FPNN) to jointly handle misalignment, photometric and geometric transforms, occlusions and background clutter.

Person Re-Identification

Paper
Add Code

Unsupervised Salience Learning for Person Re-identification

no code implementations • CVPR 2013 • Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning.

Patch Matching Person Re-Identification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.