Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots

Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong Lae Park, Songhwai Oh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present a new class of entropy-regularized Markov decision processes (MDPs), which will be referred to as Tsallis MDPs. that inherently generalize well-known maximum entropy reinforcement learning (RL) by introducing an additional real-valued parameter called an entropic index. Our theoretical result enables us to derive and analyze different types of optimal policies with interesting properties relate to the stochasticity of the optimal policy by controlling the entropic index. To handle complex and model-free problems, such as learning a controller for a soft mobile robot, we propose a Tsallis actor-critic (TAC) method. We first observe that different RL problems have different desirable entropic indices where using proper entropic index results in superior performance compared to the state-of-the-art actor-critic methods. To mitigate the exhaustive search of the entropic index, we propose a quick-and-dirty curriculum method of gradually increasing the entropic index which will be referred to as TAC with Curricula (TAC2 ). TAC2 shows comparable performance to TAC with the optimal entropic index. Finally, We apply TAC2 to learn a controller of a soft mobile robot where TAC2 outperforms existing actor-critic methods in terms of both convergence speed and utility.

Original languageEnglish
Title of host publicationRobotics
Subtitle of host publicationScience and Systems XVI
EditorsMarc Toussaint, Antonio Bicchi, Tucker Hermans
PublisherMIT Press Journals
ISBN (Print)9780992374761
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event16th Robotics: Science and Systems, RSS 2020 - Virtual, Online
Duration: 2020 Jul 122020 Jul 16

Publication series

NameRobotics: Science and Systems
ISSN (Electronic)2330-765X

Conference

Conference16th Robotics: Science and Systems, RSS 2020
CityVirtual, Online
Period20/7/1220/7/16

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots'. Together they form a unique fingerprint.

Cite this