no code implementations • 19 Mar 2024 • Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma
We propose a technique to transfer capabilities from LLMs to VLMs.
Ranked #1 on Chart Question Answering on ChartQA (using extra training data)
Chart Question Answering Optical Character Recognition (OCR)
1 code implementation • 7 Feb 2024 • Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, Abhanshu Sharma
At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements.
Ranked #3 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)
no code implementations • 15 Nov 2023 • Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng
Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning.
no code implementations • 15 Nov 2023 • Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng
Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks.
1 code implementation • 25 Oct 2023 • Bowen Tan, Yun Zhu, Lijuan Liu, Hongyi Wang, Yonghao Zhuang, Jindong Chen, Eric Xing, Zhiting Hu
In this work, we present RedCoast(Redco), a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development.
no code implementations • 22 Aug 2023 • Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-Hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng
Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting.
1 code implementation • 25 May 2023 • Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, Lei Meng
In this work, we develop new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks using diverse wording and structures expressed through natural languages including 1) generating rewriting instruction data from Wiki edits and public corpus through instruction generation and chain-of-thought prompting; 2) collecting comparison data for reward model training through a new ranking function.
1 code implementation • 16 Sep 2022 • Yu-Chung Hsiao, Fedir Zubach, Maria Wang, Jindong Chen
We present a new task and dataset, ScreenQA, for screen content understanding via question answering.
no code implementations • 7 Jun 2022 • Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, Jindong Chen
Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12].
1 code implementation • 29 Jul 2021 • Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Aguera y Arcas
Our key intuition is that the heterogeneous features in a UI are self-aligned, i. e., the image and text features of UI components, are predictive of each other.
no code implementations • 9 Jul 2021 • Xiaoxue Zang, Ying Xu, Jindong Chen
Annotating user interfaces (UIs) that involves localization and classification of meaningful UI elements on a screen is a critical step for many mobile applications such as screen readers and voice control of devices.
no code implementations • ACL 2021 • Xiaoxue Zang, Lijuan Liu, Maria Wang, Yang song, Hao Zhang, Jindong Chen
Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context.
Ranked #5 on Image Retrieval on PhotoChat
no code implementations • 22 Dec 2020 • Zecheng He, Srinivas Sunkara, Xiaoxue Zang, Ying Xu, Lijuan Liu, Nevan Wichers, Gabriel Schubiner, Ruby Lee, Jindong Chen, Blaise Agüera y Arcas
Our methodology is designed to leverage visual, linguistic and domain-specific features in user interaction traces to pre-train generic feature representations of UIs and their components.
1 code implementation • WS 2020 • Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jian-Guo Zhang, Jindong Chen
We also benchmark a few state of the art dialogue state tracking models on the corrected dataset to facilitate comparison for future work.
no code implementations • WS 2019 • Guan-Lin Chao, Abhinav Rastogi, Semih Yavuz, Dilek Hakkani-Tür, Jindong Chen, Ian Lane
Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans.
no code implementations • 27 Feb 2019 • Jindong Chen, Ao Wang, Jiangjie Chen, Yanghua Xiao, Zhendong Chu, Jingping Liu, Jiaqing Liang, Wei Wang
Taxonomies play an important role in machine intelligence.
1 code implementation • 21 Feb 2019 • Jindong Chen, Yizhou Hu, Jingping Liu, Yanghua Xiao, Haiyun Jiang
For the purpose of measuring the importance of knowledge, we introduce attention mechanisms and propose deep Short Text Classification with Knowledge powered Attention (STCKA).
no code implementations • 24 Oct 2018 • Nevan Wichers, Dilek Hakkani-Tur, Jindong Chen
Images may have elements containing text and a bounding box associated with them, for example, text identified via optical character recognition on a computer screen image, or a natural image with labeled objects.
Optical Character Recognition Optical Character Recognition (OCR) +1