no code implementations • 23 Jan 2024 • Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.
Ranked #6 on Text-to-Video Generation on UCF-101
1 code implementation • 1 Jun 2023 • Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, Michal Irani, Inbar Mosseri, Lior Wolf
In this work, we present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model.
1 code implementation • ICCV 2023 • Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim
This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.
2 code implementations • 31 Jan 2023 • Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, Daniel Cohen-Or
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
1 code implementation • 2 Jun 2022 • Hila Chefer, Idan Schwartz, Lior Wolf
It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
1 code implementation • 11 Apr 2022 • Roni Paiss, Hila Chefer, Lior Wolf
To mitigate it, we present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input, in addition to employing the CLIP similarity loss used in previous works.
1 code implementation • 24 Oct 2021 • Hila Chefer, Sagie Benaim, Roni Paiss, Lior Wolf
We make the distinction between (i) style transfer, in which a source image is manipulated to match the textures and colors of a target image, and (ii) essence transfer, in which one edits the source image to include high-level semantic attributes from the target.
1 code implementation • ICCV 2021 • Hila Chefer, Shir Gur, Lior Wolf
Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention mechanisms.
3 code implementations • CVPR 2021 • Hila Chefer, Shir Gur, Lior Wolf
Self-attention techniques, and specifically Transformers, are dominating the field of text processing and are becoming increasingly popular in computer vision classification tasks.