WEKO3
アイテム
Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning
https://hiroshima-cu.repo.nii.ac.jp/records/1816
https://hiroshima-cu.repo.nii.ac.jp/records/1816d7238bc6-b91e-41f5-952d-18037db42956
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
|
Item type | 学術雑誌論文 / Journal Article(1) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2023-03-07 | |||||||||||
タイトル | ||||||||||||
タイトル | Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning | |||||||||||
言語 | ||||||||||||
言語 | eng | |||||||||||
キーワード | ||||||||||||
主題 | Asymptotic equipartition property (AEP) | |||||||||||
キーワード | ||||||||||||
主題 | parameter bandwidth | |||||||||||
キーワード | ||||||||||||
主題 | reinforcement learning (RL) | |||||||||||
キーワード | ||||||||||||
主題 | softmax selection | |||||||||||
資源タイプ | ||||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||||
資源タイプ | journal article | |||||||||||
著者 |
IWATA, Kazunori
× IWATA, Kazunori
× 岩田, 一貴
|
|||||||||||
抄録 | ||||||||||||
内容記述タイプ | Abstract | |||||||||||
内容記述 | Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests. | |||||||||||
書誌情報 |
IEEE Transactions on Neural Networks and Learning Systems 巻 28, 号 8, p. 1865-1877, 発行日 2016-05-11 |
|||||||||||
出版者 | ||||||||||||
出版者 | IEEE | |||||||||||
ISSN | ||||||||||||
収録物識別子タイプ | ISSN | |||||||||||
収録物識別子 | 2162237X | |||||||||||
書誌レコードID | ||||||||||||
収録物識別子タイプ | NCID | |||||||||||
収録物識別子 | AA1255553X | |||||||||||
PubMed番号 | ||||||||||||
関連タイプ | isVersionOf | |||||||||||
識別子タイプ | PMID | |||||||||||
関連識別子 | 27187974 | |||||||||||
DOI | ||||||||||||
関連タイプ | isVersionOf | |||||||||||
識別子タイプ | DOI | |||||||||||
関連識別子 | info:doi/10.1109/TNNLS.2016.2558295 | |||||||||||
権利 | ||||||||||||
権利情報 | © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.|This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ | |||||||||||
関連サイト | ||||||||||||
識別子タイプ | URI | |||||||||||
関連識別子 | http://ieeexplore.ieee.org/document/7468547/ | |||||||||||
関連名称 | http://ieeexplore.ieee.org/document/7468547/ | |||||||||||
フォーマット | ||||||||||||
内容記述タイプ | Other | |||||||||||
内容記述 | application/pdf | |||||||||||
著者版フラグ | ||||||||||||
出版タイプ | AM | |||||||||||
出版タイプResource | http://purl.org/coar/version/c_ab4af688f83e57aa |