1 code implementation • 16 Mar 2023 • Zipeng Xu, Songlong Xing, Enver Sangineto, Nicu Sebe
However, directly using CLIP to guide style transfer leads to undesirable artifacts (mainly written words and unrelated visual entities) spread over the image.
no code implementations • 27 Nov 2020 • Sijie Mai, Songlong Xing, Jiaxuan He, Ying Zeng, Haifeng Hu
A majority of existing works generally focus on aligned fusion, mostly at word level, of the three modalities to accomplish this task, which is impractical in real-world scenarios.
1 code implementation • 18 Nov 2019 • Sijie Mai, Haifeng Hu, Songlong Xing
Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative.
no code implementations • ACL 2019 • Sijie Mai, Haifeng Hu, Songlong Xing
We propose a general strategy named {`}divide, conquer and combine{'} for multimodal fusion.