Multi-agent reinforcement learning with approximate model learning for competitive games

Young Joon Park, Yoon Sang Cho, Seoung Bum Kim

Research output: Contribution to journalArticle

Abstract

We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.

Original languageEnglish
Article numbere0222215
JournalPloS one
Volume14
Issue number9
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Reinforcement learning
learning
Learning
Reward
Recurrent neural networks
methodology
neural networks
animal communication
Communication
Efficiency
prediction
Reinforcement (Psychology)
testing

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Multi-agent reinforcement learning with approximate model learning for competitive games. / Park, Young Joon; Cho, Yoon Sang; Kim, Seoung Bum.

In: PloS one, Vol. 14, No. 9, e0222215, 01.01.2019.

Research output: Contribution to journalArticle

@article{4dc461a884524e9da109383ba39be500,
title = "Multi-agent reinforcement learning with approximate model learning for competitive games",
abstract = "We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.",
author = "Park, {Young Joon} and Cho, {Yoon Sang} and Kim, {Seoung Bum}",
year = "2019",
month = "1",
day = "1",
doi = "10.1371/journal.pone.0222215",
language = "English",
volume = "14",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

TY - JOUR

T1 - Multi-agent reinforcement learning with approximate model learning for competitive games

AU - Park, Young Joon

AU - Cho, Yoon Sang

AU - Kim, Seoung Bum

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.

AB - We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.

UR - http://www.scopus.com/inward/record.url?scp=85072144072&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072144072&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0222215

DO - 10.1371/journal.pone.0222215

M3 - Article

VL - 14

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 9

M1 - e0222215

ER -