To allow non-experts to easily specify long-horizon, multi-robot collaborative tasks, researchers are increasingly using language models to translate human natural language commands into formal specifications. However, because translation can occur in multiple ways, these specifications may not always be accurate nor result in efficient downstream multi-robot planning. Our core insight is that specifications should be concise representations, making it easier for downstream planners to find optimal solutions while remaining straightforward to derive from human instructions. Given the superior performance of multi-robot planners with hierarchical specifications, we represent tasks using hierarchical structures and introduce a full pipeline that translates natural language commands into hierarchical Linear Temporal Logic (LTL) and then solves the corresponding planning problem. The translation happens in two steps with the help of Large Language Models (LLMs). Initially, an LLM transforms the instructions into a hierarchical representation defined as Hierarchical Task Tree, capturing the logical and temporal relations among tasks. Following this, a fine-tuned LLM translates sub-tasks of each task into flat LTL formulas, aggregating them to form hierarchical LTL specifications. These specifications are then leveraged for planning using off-the-shelf planners. Our framework showcases the potential of the LLM in harnessing hierarchical reasoning to automate multi-robot task planning. Through evaluations in both simulation and real-world experiments involving human participants, we demonstrate that our method can handle more complex instructions compared to existing methods. Moreover, the results indicate that our approach achieves higher success rates and lower costs in multi-robot task allocation and plan generation.
Overview of the framework, using the dishwasher loading problem as a case study. Note that the non-leaf nodes in the Hierarchical Task Tree (HTT), the language descriptions of sibling tasks, and the flat specifications are color-coded to indicate one-to-one correspondence.
The HTT tree is structured such that it unfolds level by level, where each child task is a decomposition of its parent task. Notably, the tasks at the bottom level are not necessarily indecomposable. This flexibility allows for varying numbers of levels and tasks per level, accommodating differences in task understanding and the range of primitive actions available to robot agents.
Additionally, it's important to highlight that the relation $R$ specifically captures the temporal relationships between sibling tasks that share the same parent. The temporal relationship between any two tasks in the tree can be inferred by tracing their lineage back to their common ancestor, thereby simplifying the overall complexity of the structure.
When a task instruction is received, we use LLMs to construct the HTT through a two-step process, as outlined in step 1.
To evaluate our method on tasks with more complex temporal requirements, we combine several base tasks in the The ALFRED dataset to generate derivative tasks (each derivative task can be composed with up to 4 base tasks).
We compare our method with SMART-LLM. SMART-LLM uses LLMs to generate Python scripts that invoke predefined APIs for the purposes of task decomposition and task allocation.
The metrics are as follows:
1 base task with 4 robots: Place a computer on the ottoman
1 base task with 4 robots: Pick up a green candle and place it on the countertop
4 base tasks with 1 robot in dinning room
4 base tasks with 4 robots in dinning room
4 base tasks with 4 robots in bedroom
Our real-world experiments are conducted in a tabletop setting, where the task involves a robotic arm placing fruits and vegetables onto colored plates.
Given the primarily 2D nature of the task, we convert the tabletop environment into a discrete grid world. The use of only one robotic arm simplifies the task compared to the multi-robot scenarios in the simulator, as it eliminates the need for task allocation.
Our evaluation focuses on two main aspects:
Conversion from human instructions to Hierarchical Task Tree:
Prompt for generating HTT task decomposition |
Prompt for extracting relationships between HTT sibling tasks
Generation of task-wise flat LTL specifications:
Nature language to LTL formula via a finetuned LLM |
Prompt for action completion |
An example of generated hierarchical LTL specifications
Prompts for real-world experiments:
Prompt for LLMs to generate task plan |
Prompt for LLMs to generate task plan for multi-robot handover task