Search Results for author: Zequn Jie

Found 48 papers, 14 papers with code

Matten: Video Generation with Mamba-Attention

no code implementations • 5 May 2024 • Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation.

Video Generation

Paper
Add Code

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

1 code implementation • 12 Mar 2024 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

This adaptation leads to convenient development of such LMMs with minimal modifications, however, it overlooks the intrinsic characteristics of diverse visual tasks and hinders the learning of perception capabilities.

Concept Alignment Language Modelling

Paper
Code

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

no code implementations • 8 Feb 2024 • Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.

Object object-detection +1

Paper
Add Code

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

no code implementations • 29 Jan 2024 • Shaoxiang Chen, Zequn Jie, Lin Ma

To address this issue, we propose to apply an efficient Mixture of Experts (MoE) design, which is a sparse Mixture of LoRA Experts (MoLE) for instruction finetuning MLLMs.

Language Modelling Large Language Model

Paper
Add Code

Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment

no code implementations • 15 Dec 2023 • Xiaoxu Xu, Yitian Yuan, Qiudan Zhang, Wenhui Wu, Zequn Jie, Lin Ma, Xu Wang

During the inference stage, the learned text-3D correspondence will help us ground the text queries to the 3D target objects even without 2D images.

Natural Language Queries Scene Understanding +1

Paper
Add Code

Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

no code implementations • 13 Dec 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Lechao Cheng, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.

3D Object Detection Autonomous Driving +3

Paper
Add Code

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

no code implementations • 1 Jun 2023 • Xiao Dong, Runhui Huang, XiaoYong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan Liang

Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e. g., image-text semantic alignment) and image synthesis (e. g., text-to-image generation).

Contrastive Learning Retrieval +1

Paper
Add Code

FastPillars: A Deployment-friendly Pillar-based 3D Detector

1 code implementation • 5 Feb 2023 • Sifan Zhou, Zhi Tian, Xiangxiang Chu, Xinyu Zhang, Bo Zhang, Xiaobo Lu, Chengjian Feng, Zequn Jie, Patrick Yin Chiang, Lin Ma

The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios.

3D Object Detection object-detection

123

Paper
Code

Multiple Object Tracking Challenge Technical Report for Team MT_IoT

1 code implementation • 7 Dec 2022 • Feng Yan, Zhiheng Li, Weixin Luo, Zequn Jie, Fan Liang, Xiaolin Wei, Lin Ma

This is a brief technical report of our proposed method for Multiple-Object Tracking (MOT) Challenge in Complex Environments.

Ranked #8 on Multi-Object Tracking on DanceTrack (using extra training data)

Human Detection Multi-Object Tracking +2

Paper
Code

AeDet: Azimuth-invariant Multi-view 3D Object Detection

1 code implementation • CVPR 2023 • Chengjian Feng, Zequn Jie, Yujie Zhong, Xiangxiang Chu, Lin Ma

However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization.

3D Object Detection Depth Estimation +3

Paper
Code

Weakly Supervised Semantic Segmentation via Progressive Patch Learning

1 code implementation • 16 Sep 2022 • Jinlong Li, Zequn Jie, Xu Wang, Yu Zhou, Xiaolin Wei, Lin Ma

"Progressive Patch Learning" further extends the feature destruction and patch learning to multi-level granularities in a progressive manner.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation

1 code implementation • 16 Sep 2022 • Jinlong Li, Zequn Jie, Xu Wang, Xiaolin Wei, Lin Ma

To tackle with this issue, this paper proposes an Expansion and Shrinkage scheme based on the offset learning in the deformable convolution, to sequentially improve the recall and precision of the located object in the two respective stages.

Object Weakly supervised Semantic Segmentation +1

Paper
Code

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

1 code implementation • CVPR 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.

3D Object Detection Autonomous Driving +1

156

Paper
Code

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

no code implementations • 11 Aug 2022 • Xujie Zhang, Yu Sha, Michael C. Kampffmeyer, Zhenyu Xie, Zequn Jie, Chengwen Huang, Jianqing Peng, Xiaodan Liang

ARMANI discretizes an image into uniform tokens based on a learned cross-modal codebook in its first stage and uses a Transformer to model the distribution of image tokens for a real image given the tokens of the control signals in its second stage.

Image Generation

Paper
Add Code

MT-Net Submission to the Waymo 3D Detection Leaderboard

no code implementations • 11 Jul 2022 • Shaoxiang Chen, Zequn Jie, Xiaolin Wei, Lin Ma

In this technical report, we introduce our submission to the Waymo 3D Detection leaderboard.

3D Object Detection

Paper
Add Code

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations • 30 Mar 2022 • Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling Object

279

Paper
Code

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

no code implementations • 10 Mar 2022 • Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders.

Object Visual Grounding

Paper
Add Code

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

1 code implementation • 10 Mar 2022 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.

3D dense captioning Dense Captioning +3

Paper
Code

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation • 9 Oct 2021 • Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Image Segmentation Retrieval +2

Paper
Code

Delving into the Imbalance of Positive Proposals in Two-stage Object Detection

no code implementations • 23 May 2020 • Zheng Ge, Zequn Jie, Xin Huang, Chengzheng Li, Osamu Yoshie

The first imbalance lies in the large number of low-quality RPN proposals, which makes the R-CNN module (i. e., post-classification layers) become highly biased towards the negative proposals in the early training stage.

object-detection Object Detection

Paper
Add Code

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning

1 code implementation • CVPR 2020 • Yuan Gao, Haoping Bai, Zequn Jie, Jiayi Ma, Kui Jia, Wei Liu

We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL).

Multi-Task Learning Neural Architecture Search

Paper
Code

NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing

no code implementations • CVPR 2020 • Xin Huang, Zheng Ge, Zequn Jie, Osamu Yoshie

To acquire the visible parts, a novel Paired-Box Model (PBM) is proposed to simultaneously predict the full and visible boxes of a pedestrian.

Pedestrian Detection

Paper
Add Code

PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression

no code implementations • 16 Mar 2020 • Zheng Ge, Zequn Jie, Xin Huang, Rong Xu, Osamu Yoshie

PS-RCNN first detects slightly/none occluded objects by an R-CNN module (referred as P-RCNN), and then suppress the detected instances by human-shaped masks so that the features of heavily occluded instances can stand out.

Ranked #2 on Object Detection on WiderPerson

Human Detection Object Detection

Paper
Add Code

Central Similarity Quantization for Efficient Image and Video Retrieval

1 code implementation • CVPR 2020 • Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng

In this work, we propose a new \emph{global} similarity metric, termed as \emph{central similarity}, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy.

Quantization Retrieval +1

231

Paper
Code

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network

no code implementations • 9 Dec 2018 • Xinpeng Chen, Lin Ma, Jingyuan Chen, Zequn Jie, Wei Liu, Jiebo Luo

Experiments on RefCOCO, RefCOCO+, and RefCOCOg datasets demonstrate that our proposed SSG without relying on any region proposals can achieve comparable performance with other advanced models.

Attribute Referring Expression +1

Paper
Add Code

A Sufficient Condition for Convergences of Adam and RMSProp

no code implementations • CVPR 2019 • Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu

Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in the convex setting via a few simple counterexamples.

Stochastic Optimization

Paper
Add Code

Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry

no code implementations • ECCV 2018 • Yonggen Ling, Linchao Bao, Zequn Jie, Fengming Zhu, Ziyang Li, Shanmin Tang, Yongsheng Liu, Wei Liu, Tong Zhang

Our approach is able to handle the rolling-shutter effects and imperfect sensor synchronization in a unified way.

Paper
Add Code

Temporally Grounding Natural Sentence in Video

no code implementations • EMNLP 2018 • Jingyuan Chen, Xinpeng Chen, Lin Ma, Zequn Jie, Tat-Seng Chua

We introduce an effective and efficient method that grounds (i. e., localizes) natural sentences in long, untrimmed video sequences.

Sentence Video Captioning

Paper
Add Code

Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation

no code implementations • ECCV 2018 • Zhen-Yu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, Jian Yang

In this paper, we propose a novel joint Task-Recursive Learning (TRL) framework for the closing-loop semantic segmentation and monocular depth estimation tasks.

Ranked #76 on Semantic Segmentation on NYU Depth v2

Monocular Depth Estimation Segmentation +1

Paper
Add Code

A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration

no code implementations • 10 Aug 2018 • Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}.

Stochastic Optimization

Paper
Add Code

Policy Optimization with Demonstrations

no code implementations • ICML 2018 • Bingyi Kang, Zequn Jie, Jiashi Feng

Exploration remains a significant challenge to reinforcement learning methods, especially in environments where reward signals are sparse.

Policy Gradient Methods Reinforcement Learning (RL)

Paper
Add Code

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation

no code implementations • CVPR 2018 • Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, Thomas S. Huang

Despite remarkable progress, weakly supervised segmentation methods are still inferior to their fully supervised counterparts.

Classification General Classification +5

Paper
Add Code

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation

no code implementations • CVPR 2018 • Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie, Jiashi Feng, Thomas S. Huang

It can produce dense and reliable object localization maps and effectively benefit both weakly- and semi- supervised semantic segmentation.

Object Object Localization +3

Paper
Add Code

Modular Generative Adversarial Networks

2 code implementations • ECCV 2018 • Bo Zhao, Bo Chang, Zequn Jie, Leonid Sigal

Existing methods for multi-domain image-to-image translation (or generation) attempt to directly map an input image (or a random vector) to an image in one of the output domains.

Attribute Image-to-Image Translation +1

Paper
Code

Left-Right Comparative Recurrent Model for Stereo Matching

no code implementations • CVPR 2018 • Zequn Jie, Pengfei Wang, Yonggen Ling, Bo Zhao, Yunchao Wei, Jiashi Feng, Wei Liu

Left-right consistency check is an effective way to enhance the disparity estimation by referring to the information from the opposite view.

Disparity Estimation Stereo Disparity Estimation +2

Paper
Add Code

Predicting Scene Parsing and Motion Dynamics in the Future

no code implementations • NeurIPS 2017 • Xiaojie Jin, Huaxin Xiao, Xiaohui Shen, Jimei Yang, Zhe Lin, Yunpeng Chen, Zequn Jie, Jiashi Feng, Shuicheng Yan

The ability of predicting the future is important for intelligent systems, e. g. autonomous vehicles and robots to plan early and make decisions accordingly.

Autonomous Vehicles motion prediction +2

Paper
Add Code

Learning with Rethinking: Recurrently Improving Convolutional Neural Networks through Feedback

no code implementations • 15 Aug 2017 • Xin Li, Zequn Jie, Jiashi Feng, Changsong Liu, Shuicheng Yan

However, most of the existing CNN models only learn features through a feedforward structure and no feedback information from top to bottom layers is exploited to enable the networks to refine themselves.

Paper
Add Code

FoveaNet: Perspective-aware Urban Scene Parsing

no code implementations • ICCV 2017 • Xin Li, Zequn Jie, Wei Wang, Changsong Liu, Jimei Yang, Xiaohui Shen, Zhe Lin, Qiang Chen, Shuicheng Yan, Jiashi Feng

Thus, they suffer from heterogeneous object scales caused by perspective projection of cameras on actual scenes and inevitably encounter parsing failures on distant objects as well as other boundary and recognition errors.

Scene Parsing

Paper
Add Code

Neural Person Search Machines

no code implementations • ICCV 2017 • Hao Liu, Jiashi Feng, Zequn Jie, Karlekar Jayashree, Bo Zhao, Meibin Qi, Jianguo Jiang, Shuicheng Yan

We investigate the problem of person search in the wild in this work.

Ranked #4 on Person Re-Identification on CUHK-SYSU

Person Search

Paper
Add Code

Video-based Person Re-identiﬁcation with Accumulative Motion Context

1 code implementation • 13 Jun 2017 • Hao liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, Jiashi Feng

Video based person re-identification plays a central role in realistic security and video surveillance.

Video-Based Person Re-Identification

Paper
Code

Deep Self-Taught Learning for Weakly Supervised Object Localization

no code implementations • CVPR 2017 • Zequn Jie, Yunchao Wei, Xiaojie Jin, Jiashi Feng, Wei Liu

To overcome this issue, we propose a deep self-taught learning approach, which makes the detector learn the object-level features reliable for acquiring tight positive samples and afterwards re-train itself based on them.

Ranked #20 on Weakly Supervised Object Detection on PASCAL VOC 2012 test

Object Weakly Supervised Object Detection +1

Paper
Add Code

Multi-View Image Generation from a Single-View

no code implementations • 17 Apr 2017 • Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao liu, Zequn Jie, Jiashi Feng

This paper addresses a challenging problem -- how to generate multi-view cloth images from only a single view input.

Image Generation Variational Inference

Paper
Add Code

Tree-Structured Reinforcement Learning for Sequential Object Localization

no code implementations • NeurIPS 2016 • Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu, Shuicheng Yan

Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal.

Object Object Localization +2

Paper
Add Code

Video-based Person Re-identification with Accumulative Motion Context

no code implementations • 1 Jan 2017 • Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, Jiashi Feng

Video based person re-identification plays a central role in realistic security and video surveillance.

Video-Based Person Re-Identification

Paper
Add Code

Video Scene Parsing with Predictive Feature Learning

no code implementations • ICCV 2017 • Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan

In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations.

Representation Learning Scene Parsing

Paper
Add Code

Multi-Path Feedback Recurrent Neural Network for Scene Parsing

no code implementations • 27 Aug 2016 • Xiaojie Jin, Yunpeng Chen, Jiashi Feng, Zequn Jie, Shuicheng Yan

In this paper, we consider the scene parsing problem and propose a novel Multi-Path Feedback recurrent neural network (MPF-RNN) for parsing scene images.

Scene Parsing

Paper
Add Code

Scale-aware Pixel-wise Object Proposal Networks

no code implementations • 19 Jan 2016 • Zequn Jie, Xiaodan Liang, Jiashi Feng, Wen Feng Lu, Eng Hock Francis Tay, Shuicheng Yan

In particular, in order to improve the localization accuracy, a fully convolutional network is employed which predicts locations of object proposals for each pixel.

Object object-detection +2

Paper
Add Code

Reversible Recursive Instance-level Object Segmentation

no code implementations • CVPR 2016 • Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Zequn Jie, Jiashi Feng, Liang Lin, Shuicheng Yan

By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing.

Denoising Object +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.