Search Results for author: Jiwen Zhang

Found 5 papers, 3 papers with code

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

1 code implementation • 2 Apr 2024 • Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei

For task completion, the agent needs to align and integrate various navigation modalities, including instruction, observation and navigation history.

Contrastive Learning Decision Making +2

Paper
Code

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

1 code implementation • 5 Mar 2024 • Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.

Language Modelling Large Language Model