no code implementations • SIGDIAL (ACL) 2022 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent’s mental state transition and teach the agent to make decisions guided by its value like a human.
no code implementations • 21 Mar 2024 • Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li
To this end, we introduce MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs.
1 code implementation • 27 Feb 2024 • Xiao Liu, Zirui Wu, Xueqing Wu, Pan Lu, Kai-Wei Chang, Yansong Feng
To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
1 code implementation • 8 Feb 2024 • Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.
Ranked #5 on Video Question Answering on MVBench
1 code implementation • 9 Jan 2024 • Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng
One critical challenge that has emerged is the presence of hallucinations in the output of large language models (LLMs) due to false or outdated knowledge.
1 code implementation • 3 Oct 2023 • Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao
To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks.
1 code implementation • 20 Jul 2023 • Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang
Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations.
1 code implementation • 21 May 2023 • Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia
We evaluate a wide spectrum of 16 large language and code models with different prompting strategies like Chain-of-Thoughts and Program-of-Thoughts.
Ranked #1 on Natural Questions on TheoremQA
1 code implementation • 2 May 2023 • Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang
The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.
3 code implementations • 28 Apr 2023 • Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao
This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.
Ranked #6 on Visual Question Answering (VQA) on InfiMM-Eval
Instruction Following Optical Character Recognition (OCR) +7
1 code implementation • NeurIPS 2023 • Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao
At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response.
7 code implementations • 28 Mar 2023 • Renrui Zhang, Jiaming Han, Chris Liu, Peng Gao, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Yu Qiao
We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model.
Ranked #2 on Music Question Answering on MusicQA
1 code implementation • 20 Dec 2022 • Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, Kai-Wei Chang
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life.
1 code implementation • 6 Dec 2022 • Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, Xiaodan Liang
Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation.
Ranked #3 on Mathematical Reasoning on PGPS9K
1 code implementation • 31 Oct 2022 • Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling.
Ranked #1 on Mathematical Reasoning on Lila (OOD)
2 code implementations • 29 Sep 2022 • Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan
However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.
1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan
We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.
Ranked #5 on Science Question Answering on ScienceQA
no code implementations • 9 Mar 2022 • Yizhou Zhao, Liang Qiu, Wensi Ai, Pan Lu, Song-Chun Zhu
We propose a Spatial-Temporal And-Or graph (ST-AOG), a stochastic grammar model, to encode the contextual relationship between motion, emotion, and relation, forming a triangle in a conditional random field.
1 code implementation • 12 Dec 2021 • Yizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, Song-Chun Zhu
Current pre-training methods in computer vision focus on natural images in the daily-life context.
no code implementations • 12 Dec 2021 • Liang Qiu, Yizhou Zhao, Jinchao Li, Pan Lu, Baolin Peng, Jianfeng Gao, Song-Chun Zhu
To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems.
1 code implementation • 25 Oct 2021 • Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.
Ranked #1 on Visual Question Answering (VQA) on IconQA
no code implementations • ACL 2021 • Liang Qiu, Yuan Liang, Yizhou Zhao, Pan Lu, Baolin Peng, Zhou Yu, Ying Nian Wu, Song-Chun Zhu
Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly.
Ranked #5 on Dialog Relation Extraction on DialogRE
1 code implementation • ACL 2021 • Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu
We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS).
Ranked #1 on Mathematical Question Answering on GeoS
no code implementations • 12 Mar 2021 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent's mental state transition and teach the agent to make decisions guided by its value like a human.
no code implementations • AAAI Conference on Artificial Intelligence (AAAI 2020) 2020 • Wei Zhang, Yue Ying, Pan Lu, Hongyuan Zha
Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users’ writing style and traits, and is more practical to meet users’ real demands.
no code implementations • International Joint Conferences on Artifical Intelligence (IJCAI) 2019 • Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan
In this paper, we develop a Scene Concept Graph (SCG) by aggregating image scene graphs and extracting frequently co-occurred concept pairs as scene common-sense knowledge.
no code implementations • 2019 IEEE 35th International Conference on Data Engineering (ICDE) 2019 • Ning Liu, Pan Lu, Wei zhang, Jianyong Wang
To address the above issues, we propose novel Knowledge-aware Deep Dual Networks (K-DDN) for the text-based mortality prediction task.
no code implementations • Advances in Knowledge Discovery and Data Mining (PAKDD 2019) 2019 • Yuanquan Lu, Wei zhang, Pan Lu, Jianyong Wang
Nowadays, the online interactions between users and items become diverse, and may include textual reviews as well as numerical ratings.
no code implementations • 13 Dec 2018 • Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.
no code implementations • ECCV 2018 • Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang
Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.
Ranked #14 on Visual Question Answering (VQA) on CLEVR
1 code implementation • 24 May 2018 • Pan Lu, Lei Ji, Wei zhang, Nan Duan, Ming Zhou, Jianyong Wang
To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA.
1 code implementation • 18 Nov 2017 • Pan Lu, Hongsheng Li, Wei zhang, Jianyong Wang, Xiaogang Wang
Existing VQA methods mainly adopt the visual attention mechanism to associate the input question with corresponding image regions for effective question answering.