Search Results for author: Ishaan Bhola

Found 4 papers, 1 papers with code

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

no code implementations24 May 2024 Abdur Rahman, Rajat Chawla, Muskaan Kumar, Arkajit Datta, Adarsh Jha, Mukunda NS, Ishaan Bhola

In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs).

GUIDE: Graphical User Interface Data for Execution

no code implementations9 Apr 2024 Rajat Chawla, Adarsh Jha, Muskaan Kumar, Mukunda NS, Ishaan Bhola

The dataset's multi-platform nature and coverage of diverse websites enable the exploration of cross-interface capabilities in automation tasks.

Language Modelling Large Language Model +1

AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation

no code implementations15 Mar 2024 Arkajit Datta, Tushar Verma, Rajat Chawla, Mukunda N. S, Ishaan Bhola

In recent advancements within the domain of Large Language Models (LLMs), there has been a notable emergence of agents capable of addressing Robotic Process Automation (RPA) challenges through enhanced cognitive capabilities and sophisticated reasoning.

Autonomous Navigation

Veagle: Advancements in Multimodal Representation Learning

1 code implementation18 Jan 2024 Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola

In response to the limitations observed in current Vision Language Models (VLMs) and Multimodal Large Language Models (MLLMs), our proposed model Veagle, incorporates a unique mechanism inspired by the successes and insights of previous works.

Image Captioning Language Modelling +4

Cannot find the paper you are looking for? You can Submit a new open access paper.