Offline Reinforcement Learning in World Of Tanks

Pastellas, Ioannis

dc.contributor.advisor	Christodoulou, Chris	en
dc.contributor.advisor	Vassiliades, Vassilis	en
dc.contributor.author	Pastellas, Ioannis	en
dc.coverage.spatial	Cyprus	en
dc.creator	Pastellas, Ioannis	en
dc.date.accessioned	2024-07-24T10:09:38Z
dc.date.available	2024-07-24T10:09:38Z
dc.date.issued	2024-06
dc.identifier.uri	http://gnosis.library.ucy.ac.cy/handle/7/66325	en
dc.description.abstract	Offline reinforcement learning (RL) has emerged as a promising approach for training intelligent agents without requiring real-time interaction with an environment, addressing a key limitation of traditional RL. This capability is particularly valuable in domains where direct interactions are dangerous or impractical, such as healthcare, finance, and hazardous environments. By leveraging offline RL, it is possible to create autonomous agents capable of deriving optimal policies from static datasets, thereby facilitating automation in diverse decision-making realms. This study explores the application of offline RL techniques in the context of World of Tanks, an online multiplayer tank combat game. We evaluated several offline RL algorithms, including Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), Decision Transformer (DT), and Deep Deterministic Policy Gradient (DDPG). Our results indicate that CQL and IQL achieved significant returns under various discount factors, demonstrating robustness and adaptability in offline settings. Notably, higher discount factors led to better cumulative returns, particularly for CQL and IQL. Effective handling of data distribution shifts was crucial for algorithm robustness, with regularization techniques in CQL and modified architectures in IQL proving effective. In addition, offline RL algorithms (IQL, CQL, Decision Transformer) seem to perform better than the baselines ( Behavioral Cloning, policies from dataset). The volume of training data significantly influenced the performance of offline RL algorithms, with larger datasets enhancing learning effectiveness. However, evaluating offline RL policies remains challenging due to the lack of real-time interaction with the environment. We employed methods such as model-based dynamics and policy value estimation, despite their limitations in accurately predicting real-world performance. This study contributes to the methodology of offline RL research and suggests directions for future advancements.	en
dc.language.iso	eng	en
dc.publisher	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences
dc.rights	info:eu-repo/semantics/openAccess	en
dc.rights	Open Access	en
dc.rights	CC0 1.0 Universal	*
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/	*
dc.title	Offline Reinforcement Learning in World Of Tanks	en
dc.title.alternative	Offline (χωρίς διάδραση με περιβάλλον) Ενισχυτική Mάθηση στο World Of Tanks	el
dc.type	info:eu-repo/semantics/masterThesis	en
dc.contributor.committeemember	Aristidou, Andreas	en
dc.contributor.department	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών, Τμήμα Πληροφορικής	el
dc.contributor.department	University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science	en
dc.subject.uncontrolledterm	ΤΕΧΝΗΤΗ ΝΟΗΜΟΣΥΝΗ	el
dc.subject.uncontrolledterm	ARTIFICIAL INTELLIGENCE	en
dc.subject.uncontrolledterm	REINFORCEMENT LEARNING	en
dc.subject.uncontrolledterm	MACHINE LEARNING	en
dc.subject.uncontrolledterm	ΕΝΙΣΧΥΤΙΚΗ ΜΑΘΗΣΗ	el
dc.subject.uncontrolledterm	ΜΗΧΑΝΙΚΗ ΜΑΘΗΣΗ	el
dc.author.faculty	Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtype	Master Thesis	en
dc.contributor.orcid	Pastellas, Ioannis [0000-0002-1193-6280]
dc.contributor.orcid	Christodoulou, Chris [0000-0001-9398-5256]
dc.contributor.orcid	Vassiliades, Vassilis [0000-0002-1336-5629]
dc.contributor.orcid	Aristidou, Andreas [0000-0001-7754-0791]
dc.gnosis.orcid	0000-0002-1193-6280
dc.gnosis.orcid	0000-0001-9398-5256
dc.gnosis.orcid	0000-0002-1336-5629
dc.gnosis.orcid	0000-0001-7754-0791

Files in this item

Name:: Ioannis_Pastellas_2024_secured.pdf
Size:: 1.012Mb
Format:: PDF
Description:: Master Thesis

View/Open

Name:: license_rdf
Size:: 1.063Kb
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Τμήμα Πληροφορικής / Department of Computer Science [117]

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess