1 code implementation • Findings (ACL) 2021 • Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti
Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages.
1 code implementation • CVPR 2021 • Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Jianfeng Gao, Dongdong Zhang, Nan Duan
We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training.
no code implementations • 22 Jan 2020 • Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti
In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding.
Ranked #15 on Zero-Shot Cross-Modal Retrieval on COCO 2014
no code implementations • 7 Dec 2017 • Lucas Roberts, Leo Razoumov, Lin Su, Yuyang Wang
Moreover, we show that the Gini regularized OT problem converges to the classical OT problem, when the Gini regularized problem is considered as a function of {\lambda}, the regularization parame-ter.