End-to-end robot manipulation using demonstration-guided goal strategies

Cheol Hui Min, Jae-Bok Song

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In deep reinforcement learning, finding the optimal manipulation policy of a multi-DOF manipulator in 3D space requires intricate reward shaping for the agent to find the optimal policy. However, reward shaping requires cumbersome optimization of the reward function based on prior knowledge on robotic tasks to achieve. This makes it desirable to learn various manipulation policies with a simple reward function.In this study, we propose a method that learns the manipulation policy of a manipulator in a sparse reward setting. To this end, Hindsight Experience Replay (HER) is combined with Twin Delayed DDPG (TD3) by applying the goal strategy that incorporates demonstrations for the policy. It is shown that the policy can estimate the joint control command of a 7-DoF manipulator from raw RGB video inputs in sparse reward setting in an end-to-end manner.

Original languageEnglish
Title of host publication2019 16th International Conference on Ubiquitous Robots, UR 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages159-164
Number of pages6
ISBN (Electronic)9781728132327
DOIs
Publication statusPublished - 2019 Jun 1
Event16th International Conference on Ubiquitous Robots, UR 2019 - Jeju, Korea, Republic of
Duration: 2019 Jun 242019 Jun 27

Publication series

Name2019 16th International Conference on Ubiquitous Robots, UR 2019

Conference

Conference16th International Conference on Ubiquitous Robots, UR 2019
CountryKorea, Republic of
CityJeju
Period19/6/2419/6/27

Fingerprint

Reward
Manipulators
Manipulation
Demonstrations
Robot
Robots
Manipulator
Reinforcement learning
Robotics
Optimal Policy
Reinforcement Learning
Prior Knowledge
Strategy
Policy
Optimization
Estimate

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Human-Computer Interaction
  • Mechanical Engineering
  • Control and Optimization

Cite this

Min, C. H., & Song, J-B. (2019). End-to-end robot manipulation using demonstration-guided goal strategies. In 2019 16th International Conference on Ubiquitous Robots, UR 2019 (pp. 159-164). [8768699] (2019 16th International Conference on Ubiquitous Robots, UR 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/URAI.2019.8768699

End-to-end robot manipulation using demonstration-guided goal strategies. / Min, Cheol Hui; Song, Jae-Bok.

2019 16th International Conference on Ubiquitous Robots, UR 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 159-164 8768699 (2019 16th International Conference on Ubiquitous Robots, UR 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Min, CH & Song, J-B 2019, End-to-end robot manipulation using demonstration-guided goal strategies. in 2019 16th International Conference on Ubiquitous Robots, UR 2019., 8768699, 2019 16th International Conference on Ubiquitous Robots, UR 2019, Institute of Electrical and Electronics Engineers Inc., pp. 159-164, 16th International Conference on Ubiquitous Robots, UR 2019, Jeju, Korea, Republic of, 19/6/24. https://doi.org/10.1109/URAI.2019.8768699
Min CH, Song J-B. End-to-end robot manipulation using demonstration-guided goal strategies. In 2019 16th International Conference on Ubiquitous Robots, UR 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 159-164. 8768699. (2019 16th International Conference on Ubiquitous Robots, UR 2019). https://doi.org/10.1109/URAI.2019.8768699
Min, Cheol Hui ; Song, Jae-Bok. / End-to-end robot manipulation using demonstration-guided goal strategies. 2019 16th International Conference on Ubiquitous Robots, UR 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 159-164 (2019 16th International Conference on Ubiquitous Robots, UR 2019).
@inproceedings{9de121a32f474bddbcb9fc1b24a05aa5,
title = "End-to-end robot manipulation using demonstration-guided goal strategies",
abstract = "In deep reinforcement learning, finding the optimal manipulation policy of a multi-DOF manipulator in 3D space requires intricate reward shaping for the agent to find the optimal policy. However, reward shaping requires cumbersome optimization of the reward function based on prior knowledge on robotic tasks to achieve. This makes it desirable to learn various manipulation policies with a simple reward function.In this study, we propose a method that learns the manipulation policy of a manipulator in a sparse reward setting. To this end, Hindsight Experience Replay (HER) is combined with Twin Delayed DDPG (TD3) by applying the goal strategy that incorporates demonstrations for the policy. It is shown that the policy can estimate the joint control command of a 7-DoF manipulator from raw RGB video inputs in sparse reward setting in an end-to-end manner.",
author = "Min, {Cheol Hui} and Jae-Bok Song",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/URAI.2019.8768699",
language = "English",
series = "2019 16th International Conference on Ubiquitous Robots, UR 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "159--164",
booktitle = "2019 16th International Conference on Ubiquitous Robots, UR 2019",

}

TY - GEN

T1 - End-to-end robot manipulation using demonstration-guided goal strategies

AU - Min, Cheol Hui

AU - Song, Jae-Bok

PY - 2019/6/1

Y1 - 2019/6/1

N2 - In deep reinforcement learning, finding the optimal manipulation policy of a multi-DOF manipulator in 3D space requires intricate reward shaping for the agent to find the optimal policy. However, reward shaping requires cumbersome optimization of the reward function based on prior knowledge on robotic tasks to achieve. This makes it desirable to learn various manipulation policies with a simple reward function.In this study, we propose a method that learns the manipulation policy of a manipulator in a sparse reward setting. To this end, Hindsight Experience Replay (HER) is combined with Twin Delayed DDPG (TD3) by applying the goal strategy that incorporates demonstrations for the policy. It is shown that the policy can estimate the joint control command of a 7-DoF manipulator from raw RGB video inputs in sparse reward setting in an end-to-end manner.

AB - In deep reinforcement learning, finding the optimal manipulation policy of a multi-DOF manipulator in 3D space requires intricate reward shaping for the agent to find the optimal policy. However, reward shaping requires cumbersome optimization of the reward function based on prior knowledge on robotic tasks to achieve. This makes it desirable to learn various manipulation policies with a simple reward function.In this study, we propose a method that learns the manipulation policy of a manipulator in a sparse reward setting. To this end, Hindsight Experience Replay (HER) is combined with Twin Delayed DDPG (TD3) by applying the goal strategy that incorporates demonstrations for the policy. It is shown that the policy can estimate the joint control command of a 7-DoF manipulator from raw RGB video inputs in sparse reward setting in an end-to-end manner.

UR - http://www.scopus.com/inward/record.url?scp=85070554812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070554812&partnerID=8YFLogxK

U2 - 10.1109/URAI.2019.8768699

DO - 10.1109/URAI.2019.8768699

M3 - Conference contribution

AN - SCOPUS:85070554812

T3 - 2019 16th International Conference on Ubiquitous Robots, UR 2019

SP - 159

EP - 164

BT - 2019 16th International Conference on Ubiquitous Robots, UR 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -