Engineering and Technology

Human 0, MLLM 1: Unlocking New Layers of Automation in Language-Conditioned Robotics with Multimodal LLMs

R. Elmallah, N. Zamani, et al.

Discover the groundbreaking research conducted by Ramy ElMallah, Nima Zamani, and Chi-Guhn Lee, which explores automating human functions in language-conditioned robotics using Multimodal Large Language Models. Their experiments demonstrated impressive results with GPT-4 and Google Gemini, achieving over 90% accuracy in feasibility analysis. This study reveals how MLLM-IL has the potential to revolutionize the field.

00:00

Playback language: English

Index

Abstract

Language-conditioned robotics has seen significant advancements, but most frameworks still rely heavily on human-in-the-loop (HITL) methods for feasibility analysis, progress assessment, and success detection. This limits scalability. This paper proposes automating these human functions using Multimodal Large Language Models in the Loop (MLLM-IL). Experiments using GPT-4 and Google Gemini show significant zero-shot success, with feasibility analysis accuracies exceeding 90%. The study investigates the impact of LLMs, image resolution, and prompt structure on performance. Results highlight the potential of MLLM-IL to enhance language-conditioned robotics.

Publisher

IEEE Robotics and Automation Letters

Published On

Authors

Ramy ElMallah, Nima Zamani, Chi-Guhn Lee

DOI

https://doi.org/10.1109/me61309.2024.10789747

Related Publications

Explore these studies to deepen your understanding of the subject.

Health and Fitness

The effect of daily intake of vitamin D-fortified yogurt drink, with and without added calcium, on serum adiponectin and sirtuins 1 and 6 in adult subjects with type 2 diabetes

B. Nikooyeh, B. W. Hollis, et al.

Medicine and Health

Identification of a new cannabidiol n-hexyl homolog in a medicinal cannabis variety with an antinociceptive activity in mice: cannabidihexol

P. Linciano, C. Citti, et al.

Education

Rater variability and reliability of constructed response questions in New York state high-stakes tests of English language arts and mathematics: implications for educational assessment policy

J. Huang and P. B. Whipple

Linguistics and Languages

An ethnographic study of multilingual language policy localization with a focus on the resolution of communication problems in international Non-governmental Organizations (INGOs)

Y. Zhao, R. Zhao, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny