TY - JOUR
T1 - Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes
AU - Choi, Sungjoon
AU - Lee, Kyungjae
AU - Oh, Songhwai
N1 - Funding Information:
Manuscript received February 17, 2018; accepted November 28, 2018. Date of publication January 25, 2019; date of current version May 31, 2019. This paper was recommended for publication by Associate Editor D. Kulic and Editor A. Billard upon evaluation of the reviewers’ comments. This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT under Grant NRF-2017R1A2B2006136. (Corresponding author: Songhwai Oh.) The authors are with the Department of Electrical and Computer Engineering and the Automation and Systems Research Institute, Seoul National University, Seoul 08826, South Korea (e-mail:, sungjoon.choi@rllab.snu.ac.kr; kyungjae. lee@rllab.snu.ac.kr; songhwai@snu.ac.kr).
Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - In this paper, we focus on the problem of learning from demonstration (LfD) where demonstrations with different proficiencies are provided without labeling. To this end, we model multiple policies with different qualities as correlated Gaussian processes and present a leverage optimization method that estimates the leverage of each policy where the difference between two leverages defines the correlation between the corresponding policies. To recover a single policy function of an expert, we present a sparsity constraint on the leverage parameters. We first show that the proposed leverage optimization method can recover the correlations between sensory fields where the fields are realized from correlated Gaussian processes and sensor measurements are collected from the fields. Furthermore, we applied the proposed method to autonomous driving experiments, where demonstrations are collected from three different driving modes. While the driving policies are not realized from correlated processes, the proposed method assigns reasonable leverages to the driving demonstrations. The estimated driving policy of an expert, which incorporates the optimized leverages, outperforms previous LfD methods in terms of both safety and driving quality.
AB - In this paper, we focus on the problem of learning from demonstration (LfD) where demonstrations with different proficiencies are provided without labeling. To this end, we model multiple policies with different qualities as correlated Gaussian processes and present a leverage optimization method that estimates the leverage of each policy where the difference between two leverages defines the correlation between the corresponding policies. To recover a single policy function of an expert, we present a sparsity constraint on the leverage parameters. We first show that the proposed leverage optimization method can recover the correlations between sensory fields where the fields are realized from correlated Gaussian processes and sensor measurements are collected from the fields. Furthermore, we applied the proposed method to autonomous driving experiments, where demonstrations are collected from three different driving modes. While the driving policies are not realized from correlated processes, the proposed method assigns reasonable leverages to the driving demonstrations. The estimated driving policy of an expert, which incorporates the optimized leverages, outperforms previous LfD methods in terms of both safety and driving quality.
KW - Autonomous navigation
KW - learning from demonstration (LfD)
KW - leveraged Gaussian processes (LGPs)
KW - robust estimation
UR - http://www.scopus.com/inward/record.url?scp=85067108220&partnerID=8YFLogxK
U2 - 10.1109/TRO.2019.2891173
DO - 10.1109/TRO.2019.2891173
M3 - Article
AN - SCOPUS:85067108220
VL - 35
SP - 564
EP - 576
JO - IEEE Transactions on Robotics
JF - IEEE Transactions on Robotics
SN - 1552-3098
IS - 3
M1 - 8626460
ER -