Item type |
学術雑誌論文 / Journal Article(1) |
公開日 |
2023-02-28 |
タイトル |
|
|
タイトル |
A statistical property of multiagent learning based on Markov decision process |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題 |
Asymptotic equipartition property (AEP) |
キーワード |
|
|
主題 |
Markov decision process (MDP) |
キーワード |
|
|
主題 |
multiagent system |
キーワード |
|
|
主題 |
reinforcement learning (RL) |
キーワード |
|
|
主題 |
stochastic complexity (SC) |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_6501 |
|
資源タイプ |
journal article |
著者 |
IWATA, Kazunori
IKEDA, Kazushi
SAKAI, Hideaki
岩田, 一貴
|
抄録 |
|
|
内容記述タイプ |
Abstract |
|
内容記述 |
We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result. |
書誌情報 |
IEEE Transactions on Neural Networks
巻 17,
号 4,
p. 829-842,
発行日 2006-07
|
出版者 |
|
|
出版者 |
IEEE |
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
1045-9227 |
権利 |
|
|
権利情報 |
©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. |
フォーマット |
|
|
内容記述タイプ |
Other |
|
内容記述 |
application/pdf |
著者版フラグ |
|
|
出版タイプ |
VoR |
|
出版タイプResource |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |