1 code implementation • 4 Apr 2024 • Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Yangchao Wu, Stefano Soatto, Byung-Woo Hong, Dong Lao, Alex Wong
To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene.
no code implementations • 31 Jan 2024 • Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong
We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound.
1 code implementation • CVPR 2023 • Jiaben Chen, Renrui Zhang, Dongze Lian, Jiaqi Yang, Ziyao Zeng, Jianbo Shi
To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert an additional query as an audio prompt while freezing the attention mechanism.
2 code implementations • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao
In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.
Ranked #2 on 3D Open-Vocabulary Instance Segmentation on STPLS3D
1 code implementation • 3 Jul 2022 • Renrui Zhang, Ziyao Zeng, Ziyu Guo, Yafeng Li
To our best knowledge, we are the first to conduct zero-shot adaptation from the semantic language knowledge to quantified downstream tasks and perform zero-shot monocular depth estimation.
no code implementations • 4 Dec 2021 • Longtian Qiu, Renrui Zhang, Ziyu Guo, Ziyao Zeng, Zilu Guo, Yafeng Li, Guangnan Zhang
Contrastive Language-Image Pre-training (CLIP) has drawn increasing attention recently for its transferable visual representation learning.
1 code implementation • 19 Nov 2021 • Renrui Zhang, Ziyao Zeng, Ziyu Guo, Xinben Gao, Kexue Fu, Jianbo Shi
We reverse the conventional design of applying convolution on voxels and attention to points.
Ranked #36 on 3D Part Segmentation on ShapeNet-Part