1 code implementation • 30 May 2024 • Yingshan Chang, Yonatan Bisk
We find that while traditional RNNs trivially achieve inductive counting, Transformers have to rely on positional embeddings to count out-of-domain.
no code implementations • 29 May 2024 • Yasi Zhang, Peiyu Yu, Yaxuan Zhu, Yingshan Chang, Feng Gao, Ying Nian Wu, Oscar Leong
Generative models based on flow matching have attracted significant attention for their simplicity and superior performance in high-resolution image synthesis.
no code implementations • 25 Mar 2024 • Yingshan Chang, Yasi Zhang, Zhiyuan Fang, YingNian Wu, Yonatan Bisk, Feng Gao
We hypothesize that the underlying phenomenological coverage has not been proportionally scaled up, leading to a skew of the presented phenomenon which harms generalization.
no code implementations • 23 Feb 2024 • Syeda Nahida Akter, Sangwu Lee, Yingshan Chang, Yonatan Bisk, Eric Nyberg
The unique feature of this task, validating question answerability with respect to an image before answering, and the poor performance of state-of-the-art models inspired the design of a new modular baseline, LOGIC2VISION that reasons by producing and executing pseudocode without any external modules to generate the answer.
no code implementations • 24 May 2022 • Yau-Shian Wang, Yingshan Chang
It is a long-known risk that language models (LMs), once trained on corpus containing undesirable content, have the power to manifest biases and toxicity.
1 code implementation • 19 May 2022 • Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alex Hauptmann, Jianfeng Gao, Yonatan Bisk
Vision-Language Transformers can be learned without low-level human labels (e. g. class labels, bounding boxes, etc).
2 code implementations • CVPR 2022 • Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, Yonatan Bisk
Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation.