TeleMoMa A Modular and Versatile Teleoperation System for Mobile Manipulation
- TeleMoMa A Modular and Versatile Teleoperation System for Mobile Manipulation
- Venue: ICRA Workshop MoMa 2024
- Title: TeleMoMa A Modular and Versatile Teleoperation System for Mobile Manipulation
- Authors: Shivin Dass, Wensi Ai, Yuqian Jiang, Samik Singh, Jiaheng Hu, Ruohan Zhang, Peter Stone, Ben Abbatematteo, Roberto Martín-Martín
- Leader: Yohei
- Paper Link
Nutshell
TeleMoMa is a general and modular interface for whole-body teleoperation of mobile manipulators, enabling the collection of high-quality demonstration data for imitation learning in mobile manipulation tasks.
Key Ideas
- TeleMoMa enables the collection of high-quality demonstration data for imitation learning in mobile manipulation tasks, and a hybrid vision-virtual reality (VR) interface is an efficient and natural mode of teleoperation.
- The results indicate that on their own, both VR and Vision present drawbacks pertaining to their individual modalities, but when combined in the form of VR + Vision (Real world), TeleMoMa can overcome their individual drawbacks to enable an improved teleoperation experience.
- This paper describes how TeleMoMa facilitates the use of mobile phones, spacemouse, and keyboards as part of its Human Interface.
- The findings in this paper suggest that depth information is a crucial component for the development of effective mobile manipulation policies, and that the strong dependency between base and arm actions is one of the main challenges in IL for mobile manipulation
Details
Objectives
The objective of TeleMoMa is to provide a general and modular interface for whole-body teleoperation of mobile manipulators, enabling the collection of high-quality demonstration data for imitation learning in mobile manipulation tasks.
Methods
TeleMoMa uses a modular and versatile teleoperation framework that facilitates the integration of different human interfaces, robot platforms, and simulators. The system consists of a Human Interface, a Teleoperation Channel, and a Robot Interface. TeleMoMa uses the position and rotation of the hips to control the movement of the base of the mobile manipulator. It also sends compressed sensor images from the robot and decompresses them on the client, and processes the RGB-D images from the vision interface on the client side and only sends the action commands over the teleoperation channel.
Experiments and Results
User Study
Subjects
Novice users were involved in user studies to evaluate the ease of learning to collect demonstrations with different combinations of human interfaces enabled by TeleMoMa.
participants with varying levels of teleoperation experience: 12
Both tasks, but especially the dusting task, benefit from the simultaneous motion of base and arm(s), i.e., whole-body motion, as enabled by TeleMoMa since the robot is required to navigate around the desk while periodically moving the hands to clear out any dust. We recruited 12 participants with varying levels of teleoperation experience. Each user was given the same instructions and a brief practice period with each modality
participants with varying levels of teleoperation experience and compared: 6
In the second study, we sought to assess whether TeleMoMa users improve over time by measuring their learning curve. We recruited 6 participants with varying levels of teleoperation experience and compared two modalities (VR and VR + Vision, order randomized) on the pick up task (Fig. 3(i)). In this task, the robot must lower its torso in order to grasp a towel from the floor, hand the towel from one hand to the other, navigate to a table, and place the towel on the table
Results
TeleMoMa was able to teleoperate several existing mobile manipulators, including PAL Tiago++, Toyota HSR, and Fetch, in simulation and the real world. The system was also able to collect high-quality demonstration data for imitation learning in mobile manipulation tasks. The results of the user study show that remote human demonstrators have slower reaction times due to delays and limited resolutions of the camera streams, but TeleMoMa provides the capability to successfully complete the tasks under regular network conditions.
Conclusion
TeleMoMa is a helpful tool for the community, enabling researchers to collect whole-body mobile manipulation demonstrations. The system’s modularity and versatility make it a promising solution for imitation learning in mobile manipulation tasks. TeleMoMa is a novel teleoperation system for mobile manipulators that enables versatility through modularity, and it provides the capability to successfully complete tasks under regular network conditions.
Limitations
The system’s remote teleoperation capability may be limited by delays and limited resolutions of the camera streams, resulting in slower reaction times for human demonstrators. Occlusion presents a challenge for the vision-based modalities, as the camera placement has an impact on the operator’s visibility of the robot’s workspace.
Future Work
Future work may involve further developing and refining TeleMoMa to improve its performance and versatility in mobile manipulation tasks. Future work includes extending TeleMoMa to joint control when tracking human pose.
Questions
- How could it possible to train a good policy, which success rate is over 70%, with only 50-100 demonstrations? It soulds very data efficient.