Offline Reinforcement Learning in World Of Tanks
![Thumbnail](/bitstream/handle/7/66325/Ioannis_Pastellas_2024_secured.pdf.jpg?sequence=5&isAllowed=y)
View/ Open
Date
2024-06Publisher
Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied SciencesPlace of publication
CyprusGoogle Scholar check
Keyword(s):
Metadata
Show full item recordAbstract
Offline reinforcement learning (RL) has emerged as a promising approach for training intelligent agents without requiring real-time interaction with an environment, addressing a key limitation of traditional RL. This capability is particularly valuable in domains where direct interactions are dangerous or impractical, such as healthcare, finance, and hazardous environments. By leveraging offline RL, it is possible to create autonomous agents capable of deriving optimal policies from static datasets, thereby facilitating automation in diverse decision-making realms.
This study explores the application of offline RL techniques in the context of World of Tanks, an online multiplayer tank combat game. We evaluated several offline RL algorithms, including Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), Decision Transformer (DT), and Deep Deterministic Policy Gradient (DDPG). Our results indicate that CQL and IQL achieved significant returns under various discount factors, demonstrating robustness and adaptability in offline settings. Notably, higher discount factors led to better cumulative returns, particularly for CQL and IQL. Effective handling of data distribution shifts was crucial for algorithm robustness, with regularization techniques in CQL and modified architectures in IQL proving effective. In addition, offline RL algorithms (IQL, CQL, Decision Transformer) seem to perform better than the baselines ( Behavioral Cloning, policies from dataset).
The volume of training data significantly influenced the performance of offline RL algorithms, with larger datasets enhancing learning effectiveness. However, evaluating offline RL policies remains challenging due to the lack of real-time interaction with the environment. We employed methods such as model-based dynamics and policy value estimation, despite their limitations in accurately predicting real-world performance. This study contributes to the methodology of offline RL research and suggests directions for future advancements.
Collections
Cite as
The following license files are associated with this item: