no code implementations • 30 May 2024 • Kaixuan Huang, Xudong Guo, Mengdi Wang
Its performance depends on a hyperparameter K -- the candidate length, i. e., the number of candidate tokens for the target model to verify in each round.
no code implementations • 5 Apr 2024 • Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan
Second, we introduce a heterogeneous layer for decision-making, whose parameters are specifically generated by the learned latent variables.
1 code implementation • 19 Mar 2024 • Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang
Large Language Models (LLMs) have emerged as integral tools for reasoning, planning, and decision-making, drawing upon their extensive world knowledge and proficiency in language-related tasks.
no code implementations • 5 Jan 2023 • Xudong Guo, Daming Shi, Wenhui Fan
However, existing works either broadcast the messages leading to information redundancy, or learn targeted communication by modeling all the other agents as targets, which is not scalable when the number of agents varies.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • CVPR 2021 • Xudong Guo, Xun Guo, Yan Lu
However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.