TY - JOUR
T1 - Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle
AU - Wang, Ning
AU - Gao, Ying
AU - Zhao, Hong
AU - Ahn, Choon Ki
N1 - Funding Information:
Manuscript received August 3, 2019; revised January 7, 2020 and May 25, 2020; accepted July 11, 2020. Date of publication August 3, 2020; date of current version July 7, 2021. This work was supported in part by the National Natural Science Foundation of China under Grant 51009017 and Grant 51379002, in part by the Fund for Liaoning Innovative Talents in Colleges and Universities, China, under Grant LR2017024, in part by the Fund for Dalian Distinguished Young Scholars, China, under Grant 2016RJ10, in part by the Liaoning Revitalization Talents Program under Grant XLYC1807013, in part by the Stable Supporting Fund of Science and Technology on Underwater Vehicle Laboratory, China, under Grant SXJQR2018WDKT03, in part by the Fundamental Research Funds for the Central Universities, China, under Grant 3132019344, and in part by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (Ministry of Science and ICT) under Grant NRF-2020R1A2C1005449. (Corresponding author: Ning Wang.) Ning Wang, Ying Gao, and Hong Zhao are with the School of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, China (e-mail: n.wang@ieee.org).
Publisher Copyright:
© 2012 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - In this article, a novel reinforcement learning-based optimal tracking control (RLOTC) scheme is established for an unmanned surface vehicle (USV) in the presence of complex unknowns, including dead-zone input nonlinearities, system dynamics, and disturbances. To be specific, dead-zone nonlinearities are decoupled to be input-dependent sloped controls and unknown biases that are encapsulated into lumped unknowns within tracking error dynamics. Neural network (NN) approximators are further deployed to adaptively identify complex unknowns and facilitate a Hamilton-Jacobi-Bellman (HJB) equation that formulates optimal tracking. In order to derive a practically optimal solution, an actor-critic reinforcement learning framework is built by employing adaptive NN identifiers to recursively approximate the total optimal policy and cost function. Eventually, theoretical analysis shows that the entire RLOTC scheme can render tracking errors that converge to an arbitrarily small neighborhood of the origin, subject to optimal cost. Simulation results and comprehensive comparisons on a prototype USV demonstrate remarkable effectiveness and superiority.
AB - In this article, a novel reinforcement learning-based optimal tracking control (RLOTC) scheme is established for an unmanned surface vehicle (USV) in the presence of complex unknowns, including dead-zone input nonlinearities, system dynamics, and disturbances. To be specific, dead-zone nonlinearities are decoupled to be input-dependent sloped controls and unknown biases that are encapsulated into lumped unknowns within tracking error dynamics. Neural network (NN) approximators are further deployed to adaptively identify complex unknowns and facilitate a Hamilton-Jacobi-Bellman (HJB) equation that formulates optimal tracking. In order to derive a practically optimal solution, an actor-critic reinforcement learning framework is built by employing adaptive NN identifiers to recursively approximate the total optimal policy and cost function. Eventually, theoretical analysis shows that the entire RLOTC scheme can render tracking errors that converge to an arbitrarily small neighborhood of the origin, subject to optimal cost. Simulation results and comprehensive comparisons on a prototype USV demonstrate remarkable effectiveness and superiority.
KW - Completely unknown dynamics
KW - optimal tracking control
KW - reinforcement earning-based control
KW - unknown dead-zone input nonlinearities
KW - unmanned surface vehicle (USV)
UR - http://www.scopus.com/inward/record.url?scp=85095577821&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2020.3009214
DO - 10.1109/TNNLS.2020.3009214
M3 - Article
C2 - 32745008
AN - SCOPUS:85095577821
SN - 2162-237X
VL - 32
SP - 3034
EP - 3045
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 7
M1 - 9154585
ER -