dc.contributor.advisor | Christodoulou, Chris | en |
dc.contributor.advisor | Vassiliades, Vassilis | en |
dc.contributor.author | Pastellas, Ioannis | en |
dc.coverage.spatial | Cyprus | en |
dc.creator | Pastellas, Ioannis | en |
dc.date.accessioned | 2024-07-24T10:09:38Z | |
dc.date.available | 2024-07-24T10:09:38Z | |
dc.date.issued | 2024-06 | |
dc.identifier.uri | http://gnosis.library.ucy.ac.cy/handle/7/66325 | en |
dc.description.abstract | Offline reinforcement learning (RL) has emerged as a promising approach for training intelligent agents without requiring real-time interaction with an environment, addressing a key limitation of traditional RL. This capability is particularly valuable in domains where direct interactions are dangerous or impractical, such as healthcare, finance, and hazardous environments. By leveraging offline RL, it is possible to create autonomous agents capable of deriving optimal policies from static datasets, thereby facilitating automation in diverse decision-making realms.
This study explores the application of offline RL techniques in the context of World of Tanks, an online multiplayer tank combat game. We evaluated several offline RL algorithms, including Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), Decision Transformer (DT), and Deep Deterministic Policy Gradient (DDPG). Our results indicate that CQL and IQL achieved significant returns under various discount factors, demonstrating robustness and adaptability in offline settings. Notably, higher discount factors led to better cumulative returns, particularly for CQL and IQL. Effective handling of data distribution shifts was crucial for algorithm robustness, with regularization techniques in CQL and modified architectures in IQL proving effective. In addition, offline RL algorithms (IQL, CQL, Decision Transformer) seem to perform better than the baselines ( Behavioral Cloning, policies from dataset).
The volume of training data significantly influenced the performance of offline RL algorithms, with larger datasets enhancing learning effectiveness. However, evaluating offline RL policies remains challenging due to the lack of real-time interaction with the environment. We employed methods such as model-based dynamics and policy value estimation, despite their limitations in accurately predicting real-world performance. This study contributes to the methodology of offline RL research and suggests directions for future advancements. | en |
dc.language.iso | eng | en |
dc.publisher | Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences | |
dc.rights | info:eu-repo/semantics/openAccess | en |
dc.rights | Open Access | en |
dc.rights | CC0 1.0 Universal | * |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | * |
dc.title | Offline Reinforcement Learning in World Of Tanks | en |
dc.title.alternative | Offline (χωρίς διάδραση με περιβάλλον) Ενισχυτική Mάθηση στο World Of Tanks | el |
dc.type | info:eu-repo/semantics/masterThesis | en |
dc.contributor.committeemember | Aristidou, Andreas | en |
dc.contributor.department | Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών, Τμήμα Πληροφορικής | el |
dc.contributor.department | University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science | en |
dc.subject.uncontrolledterm | ΤΕΧΝΗΤΗ ΝΟΗΜΟΣΥΝΗ | el |
dc.subject.uncontrolledterm | ARTIFICIAL INTELLIGENCE | en |
dc.subject.uncontrolledterm | REINFORCEMENT LEARNING | en |
dc.subject.uncontrolledterm | MACHINE LEARNING | en |
dc.subject.uncontrolledterm | ΕΝΙΣΧΥΤΙΚΗ ΜΑΘΗΣΗ | el |
dc.subject.uncontrolledterm | ΜΗΧΑΝΙΚΗ ΜΑΘΗΣΗ | el |
dc.author.faculty | Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences | |
dc.author.department | Τμήμα Πληροφορικής / Department of Computer Science | |
dc.type.uhtype | Master Thesis | en |
dc.contributor.orcid | Pastellas, Ioannis [0000-0002-1193-6280] | |
dc.contributor.orcid | Christodoulou, Chris [0000-0001-9398-5256] | |
dc.contributor.orcid | Vassiliades, Vassilis [0000-0002-1336-5629] | |
dc.contributor.orcid | Aristidou, Andreas [0000-0001-7754-0791] | |
dc.gnosis.orcid | 0000-0002-1193-6280 | |
dc.gnosis.orcid | 0000-0001-9398-5256 | |
dc.gnosis.orcid | 0000-0002-1336-5629 | |
dc.gnosis.orcid | 0000-0001-7754-0791 | |