TY - JOUR
T1 - Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
AU - Lee, Kyungjae
AU - Choi, Sungjoon
AU - Oh, Songhwai
N1 - Funding Information:
Manuscript received September 10, 2017; accepted January 4, 2018. Date of publication January 31, 2018; date of current version February 22, 2018. This letter was recommended for publication by Associate Editor A. Agostini and Editor T. Asfour upon evaluation of the reviewers’ comments. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017R1A2B2006136) and by the Next-Generation Information Computing Development Program through the NRF funded by the Ministry of Science and ICT (2017M3C4A7065926). (Corresponding author: Songhwai Oh.) The authors are with the Department of Electrical and Computer Engineering and Automation and Systems Research Institute (ASRI), Seoul National University, Seoul 08826, South Korea. (e-mail: kyungjae.lee@cpslab.snu.ac.kr; sungjoon.choi@cpslab.snu.ac.kr; songhwai@snu.ac.kr).
Publisher Copyright:
© 2016 IEEE.
PY - 2018/7
Y1 - 2018/7
N2 - In this letter, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed. The proposed policy regularization induces a sparse and multimodal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided. We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method that solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed-point theorem. The proposed sparse MDP is compared to soft MDPs that utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance.
AB - In this letter, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed. The proposed policy regularization induces a sparse and multimodal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided. We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method that solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed-point theorem. The proposed sparse MDP is compared to soft MDPs that utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance.
KW - Autonomous agents
KW - deep learning in robotics and automation
KW - learning and adaptive systems
UR - http://www.scopus.com/inward/record.url?scp=85057277043&partnerID=8YFLogxK
U2 - 10.1109/LRA.2018.2800085
DO - 10.1109/LRA.2018.2800085
M3 - Article
AN - SCOPUS:85057277043
VL - 3
SP - 1466
EP - 1473
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
SN - 2377-3766
IS - 3
ER -