Search Results for author: Yuanze Lin

Found 8 papers, 5 papers with code

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

1 code implementation • 25 Mar 2024 • Yuanze Lin, Ronald Clark, Philip Torr

We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions.

3D Generation Text to 3D

Paper
Code

Text-Driven Image Editing via Learnable Regions

1 code implementation • 28 Nov 2023 • Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang

Language has emerged as a natural interface for image editing.

Image Generation

Paper
Code

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

no code implementations • ICCV 2023 • Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie

Coupling all these designs allows our method to enjoy both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1. 9X or more.

Question Answering Retrieval +3

Paper
Add Code

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

1 code implementation • 2 Jun 2022 • Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan

Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent.

Ranked #11 on Visual Question Answering (VQA) on OK-VQA

Question Answering Retrieval +1

Paper
Code

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

1 code implementation • CVPR 2022 • Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang

Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module.

Language Modelling Natural Language Queries +1

138

Paper
Code

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

1 code implementation • CVPR 2022 • Yulin Wang, Yang Yue, Yuanze Lin, Haojun Jiang, Zihang Lai, Victor Kulikov, Nikita Orlov, Humphrey Shi, Gao Huang

Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy.

Computational Efficiency Video Recognition

Paper
Code

Cross-Stage Transformer for Video Learning

no code implementations • 29 Sep 2021 • Yuanze Lin, Xun Guo, Yan Lu

By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.

Action Recognition Temporal Action Localization

Paper
Add Code

Self-Supervised Video Representation Learning with Meta-Contrastive Network

no code implementations • ICCV 2021 • Yuanze Lin, Xun Guo, Yan Lu

Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.

Ranked #28 on Self-Supervised Action Recognition on HMDB51

Contrastive Learning Meta-Learning +6

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.