Search Results for author: Yiyuan Zhang

Found 9 papers, 7 papers with code

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

1 code implementation • 5 Feb 2024 • Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue

We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation.

Video Generation

115

Paper
Code

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

1 code implementation • 25 Jan 2024 • Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue

We propose to improve transformers of a specific modality with irrelevant data from other modalities, e. g., improve an ImageNet model with audio or point cloud datasets.

Paper
Code

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

no code implementations • 7 Dec 2023 • Lihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue

Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on 3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets.

3D Generation Text to 3D +1

Paper
Add Code

Online Vectorized HD Map Construction using Geometry

1 code implementation • 6 Dec 2023 • Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, Fusheng Jin, Xiangyu Yue

In our work, we propose GeMap ($\textbf{Ge}$ometry $\textbf{Map}$), which end-to-end learns Euclidean shapes and relations of map instances beyond basic perception.

150

Paper
Code

OneLLM: One Framework to Align All Modalities with Language

1 code implementation • 6 Dec 2023 • Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue

In detail, we first train an image projection module to connect a vision encoder with LLM.

Ranked #80 on Visual Question Answering on MM-Vet

Question Answering Visual Question Answering

460

Paper
Code

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

2 code implementations • 27 Nov 2023 • Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan

1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep.

Ranked #1 on Object Detection on COCO 2017 (mAP metric)

Image Classification Object Detection +3

820

Paper
Code

Towards Unified and Effective Domain Generalization

1 code implementation • 16 Oct 2023 • Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue

We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures.

Ranked #1 on Domain Generalization on TerraIncognita

Domain Generalization Out-of-Distribution Generalization

Paper
Code

Deep Reinforcement Learning for Artificial Upwelling Energy Management

no code implementations • 20 Aug 2023 • Yiyuan Zhang, Wei Fan

The potential of artificial upwelling (AU) as a means of lifting nutrient-rich bottom water to the surface, stimulating seaweed growth, and consequently enhancing ocean carbon sequestration, has been gaining increasing attention in recent years.

Distributional Reinforcement Learning energy management +3

Paper
Add Code

Meta-Transformer: A Unified Framework for Multimodal Learning

1 code implementation • 20 Jul 2023 • Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue

Multimodal learning aims to build models that can process and relate information from multiple modalities.

Time Series

1,442

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.