Studies in reinforcement learning and adaptive neural networks

Vassiliades, Vassilis K.

dc.contributor.advisor	Christodoulou, Chris	en
dc.contributor.author	Vassiliades, Vassilis K.	en
dc.coverage.spatial	Κύπρος	el
dc.coverage.spatial	Cyprus	en
dc.creator	Vassiliades, Vassilis K.	en
dc.date.accessioned	2015-09-02T09:36:48Z
dc.date.accessioned	2017-08-03T10:45:38Z
dc.date.available	2015-09-02T09:36:48Z
dc.date.available	2017-08-03T10:45:38Z
dc.date.issued	2015-08
dc.date.submitted	2015-08-08
dc.identifier.uri	https://gnosis.library.ucy.ac.cy/handle/7/39580	en
dc.description	Includes bibliographical references.	en
dc.description	Number of sources in the bibliography: 387	en
dc.description	Thesis (Ph. D.) -- University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science, 2015.	en
dc.description	The University of Cyprus Library holds the printed form of the thesis.	en
dc.description.abstract	Αυτή η διατριβή μελετά την προσαρμοστικότητα σε δυναμικά περιβάλλοντα (ΔΠ) και επικεντρώνεται στις περιοχές της ενισχυτικής μάθησης (ΕΜ) και των προσαρμοστικών τεχνητών νευρωνικών δικτύων (ΤΝΔ). Στα ΔΠ υπάρχει η ανάγκη για γρήγορη προσαρμογή, και οι καθιερωμένες μέθοδοι δεν είναι πολύ αποτελεσματικές εξαιτίας της υπόθεσης τους ότι το περιβάλλον δεν αλλάζει. Ο στόχος αυτής της διατριβής είναι να αναγνωρίσει περιπτώσεις σε ΔΠ οι οποίες μπορούν να επωφεληθούν από γρηγορότερη προσαρμογή, και να καθορίσει μεθόδους για χρήση σε κάθε περίπτωση, μελετώντας την αποτελεσματικότητα τους. Αυτό επιτυγχάνεται μέσω τεσσάρων νέων μελετών: οι πρώτες δύο χρησιμοποιούν τεχνικές από την ΕΜ, ενώ οι υπόλοιπες χρησιμοποιούν μηχανισμούς για προσαρμοστικότητα στα ΤΝΔ. Αρχικά, ασχολούμαστε με ένα περιβάλλον ΕΜ πολλαπλών πρακτόρων (ΕΜΠΠ), καθώς αυτά τα περιβάλλοντα είναι γνωστό ότι είναι δυναμικά. Συγκεκριμένα, χρησιμοποιούμε το Επαναλαμβανόμενο Δίλημμα του Φυλακισμένου (ΕΔΦ) το οποίο είναι ένα παίγνιο που χρησιμοποιείται στην μοντελοποίηση του τρόπου με τον οποίο η συνεργασία μπορεί να προκύψει σε συνθήκες μη-συνεργασίας. Πειραματικές μελέτες με το ΕΔΦ έχουν δείξει ότι πράκτορες που χρησιμοποιούν έναν απλό αλγόριθμο ΕΜ γνωστό ως Q-learning δεν είχαν μεγάλες αποδόσεις. Αυτή η μελέτη δείχνει πώς η απόδοση των πρακτόρων μπορεί να αυξηθεί σημαντικά, όχι μέσω της αλλαγής του αλγορίθμου ΕΜ ή των κανόνων του ΕΔΦ, αλλά απλώς βελτιστοποιώντας την συνάρτηση αμοιβής τους με τη χρήση εξελικτικών αλγορίθμων (ΕΑ). Στην δεύτερη μελέτη προχωρούμε σε ένα πιο πολύπλοκο σενάριο ΕΜΠΠ και παρέχουμε λύση στο πρόβλημα του τρόπου επιτάχυνσης της μάθησης σε δομημένα προβλήματα ΕΜΠΠ. Μελετούμε συνεργίες μεταξύ αλγορίθμων ιεραρχικής ΕΜ (ΙΕΜ) και αλγορίθμων ΕΜΠΠ και εισαγάγουμε δυο νέους αλγόριθμους για ιεραρχική ΕΜΠΠ οι οποίοι συνδυάζουν τους μηχανισμούς από ένα αλγόριθμο ΙΕΜ ενός πράκτορα και δύο αλγόριθμων ΕΜΠΠ αντίστοιχα. Δείχνουμε ότι οι αλγόριθμοί μας έχουν σημαντικά υψηλότερη απόδοση από τους αντίστοιχους μη-ιεραρχικούς αλγόριθμους και τους αλγόριθμους ενός πράκτορα σε ένα μερικώς παρατηρήσιμο πολυπρακτορικό «πρόβλημα του ταξί». Στην τρίτη μελέτη προχωρούμε σε περιβάλλοντα ενός πράκτορα έτσι ώστε να ελέγχουμε ρητά το ΔΠ με το να αλλάζουμε την συνάρτηση μετάβασης καταστάσεων. Εστιάζουμε όχι στην άμεση βελτιστοποίηση πολιτικών, αλλά στην βελτιστοποίηση των κανόνων μάθησης που βελτιώνουν πολιτικές. Κωδικοποιώντας και την πολιτική και τον κανόνα ΕΜ ως ΤΝΔ και χρησιμοποιώντας ΕΑ για βελτιστοποίηση των κανόνων μάθησης, δείχνουμε ότι προσαρμοστικοί πράκτορες μπορούν πράγματι να δημιουργηθούν με αυτή την προσέγγιση. Δείχνουμε ότι η προσέγγισή μας έχει σημαντικά καλύτερη απόδοση από τον αλγόριθμο ΕΜ SARSA(λ) σε τρία στατικά περιβάλλοντα και ένα δυναμικό, όλα μερικώς παρατηρήσιμα. Η τελική μελέτη καταπιάνεται με ΔΠ ενός πράκτορα όπου η αλλαγή συμβαίνει στη συνάρτηση αμοιβής. Εισάγουμε ένα νέο τύπο τεχνητού «νευρώνα» για ΤΝΔ που ονομάζεται «διακόπτης-νευρώνας» (switch neuron), ο οποίος μπορεί να διακόψει όλες εκτός μιας από τις εισερχόμενες συναπτικές του συνδέσεις. Αυτή η σύνδεση καθορίζεται από το επίπεδο της ρυθμιστικής δραστηριότητας του νευρώνα, η οποία επηρεάζεται από ρυθμιστικά σήματα, όπως σήματα που κωδικοποιούν κάποια πληροφορία για την αμοιβή που έχει λάβει ο πράκτορας. Επίσης εισάγουμε ένα τρόπο για να καταστεί δυνατό αυτοί οι νευρώνες να ρυθμίζουν άλλους διακόπτες-νευρώνες και παρουσιάζουμε κατάλληλες αρχιτεκτονικές ΤΝΔ για δυναμικά δυαδικά προβλήματα συσχετίσεων και διακριτά προβλήματα Τ-λαβυρίνθων (T-mazes). Τα αποτελέσματα δείχνουν ότι αυτές οι αρχιτεκτονικές παράγουν βέλτιστες προσαρμοστικές συμπεριφορές και υποδεικνύουν τη χρησιμότητα του μοντέλου διακόπτη-νευρώνα σε καταστάσεις όπου η προσαρμοστικότητα είναι αναγκαία. Γενικά, αυτή η διατριβή συνεισφέρει στην επιτάχυνση της προσαρμογής σε ΔΠ. Σε όλους τους τύπους ΔΠ που μελετήσαμε καθορίζουμε μηχανισμούς οι οποίοι έχουν καθαρά οφέλη σε σχέση με γνωστές μεθόδους.	el
dc.description.abstract	This thesis investigates adaptation in dynamic environments, by focusing on the areas of reinforcement learning (RL) and adaptive artificial neural networks (ANNs). In dynamic environments, there is a need for fast adaptation, and standard methods are not very efficient as they assume that the environment does not change. The purpose of this thesis is to identify situations in dynamic environments that could benefit from faster adaptation, and prescribe methods to use in each situation, by investigating their effectiveness. This is done through four novel studies, where the first two use techniques from RL, while the latter utilize mechanisms for adaptation in ANNs. First, we start with a simple multiagent RL (MARL) setting, as these environments are known to be dynamic. More specifically, we use the iterated prisoner's dilemma (IPD) which is a game suitable for modeling how cooperation can arise in a non-cooperative setting. Experiments in the IPD have shown that agents which use a simple RL algorithm known as Q-learning could not achieve large cumulative payoffs. This study demonstrates how to significantly improve the performance of the agents in this game, not by changing the RL algorithm or the rules of the IPD, but by simply optimizing their reward function using evolutionary algorithms. In the second study, we proceed to a more complex MARL setting and provide a solution to the problem of how to accelerate learning in structured MARL tasks. We investigate synergies between hierarchical RL (HRL) and MARL algorithms and introduce two new algorithms for hierarchical MARL that combine the mechanisms of a single-agent HRL algorithm and two MARL algorithms respectively. We demonstrate that our algorithms perform significantly better than their non-hierarchical and non-multiagent versions in a partially observable multiagent taxi problem. In the third study, we move to single-agent settings in order to explicitly control the dynamic environment by changing its state transition function. Our focus in this study is not on directly optimizing policies, but instead on optimizing the learning rules that optimize policies. By encoding both the policy and the RL rule as ANNs and by using evolutionary algorithms to optimize the learning rules, we show that adaptive agents can indeed be created using this approach. We demonstrate that our approach has significantly better performance than the SARSA(λ) (State Action Reward State Action) RL algorithm in three stationary tasks and a nonstationary one, all partially observable. The final study deals with single-agent dynamic environments where the change happens in the reward function. We introduce a new type of artificial neuron for ANN controllers, called “switch neuron”, which is able to interrupt the flow of information from all but one of its incoming synaptic connections. This connection is determined by the neuron's level of modulatory activation which is affected by modulatory signals, such as signals that encode some information about the reward received by the agent. By additionally introducing a way of making these neurons modulate other switch neurons, we present appropriate switch neuron architectures for nonstationary binary association problems and discrete T-maze problems. The results show that these architectures generate optimal adaptive behaviors, illustrating the effectiveness of the switch neuron model in situations where adaptation is required. Overall, this thesis contributes to accelerating adaptation in dynamic environments. In all types of the studied dynamic environments, we prescribe certain mechanisms that have clear advantages over known methods.	en
dc.format.extent	xxiv, 245 p. : ill. (some col.) ; 30 cm.	en
dc.language.iso	eng	en
dc.publisher	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences
dc.rights	info:eu-repo/semantics/openAccess	en
dc.rights	Open Access	en
dc.subject.lcsh	Neural networks (Computer science)	en
dc.subject.lcsh	Neural computers	en
dc.subject.lcsh	Reinforcement learning	en
dc.title	Studies in reinforcement learning and adaptive neural networks	en
dc.title.alternative	Μελέτες στην ενισχυτική μάθηση και στα προσαρμοστικά νευρωνικά δίκτυα	el
dc.type	info:eu-repo/semantics/doctoralThesis	en
dc.contributor.committeemember	Σχίζας, Χρίστος Ν.	el
dc.contributor.committeemember	Χαραλάμπους, Χρίστος	el
dc.contributor.committeemember	Schizas, Christos N.	en
dc.contributor.committeemember	Charalambous, Christos	en
dc.contributor.committeemember	Wörgötter, Florentin	en
dc.contributor.committeemember	Bugmann, Guido	en
dc.contributor.department	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών, Τμήμα Πληροφορικής	el
dc.contributor.department	University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science	en
dc.subject.uncontrolledterm	ΕΝΙΣΧΥΤΙΚΗ ΜΑΘΗΣΗ	el
dc.subject.uncontrolledterm	ΝΕΥΡΩΝΙΚΑ ΔΙΚΤΥΑ	el
dc.subject.uncontrolledterm	ΠΡΟΣΑΡΜΟΣΤΙΚΗ ΣΥΜΠΕΡΙΦΟΡΑ	el
dc.subject.uncontrolledterm	ΙΕΡΑΡΧΙΚΗ ΕΝΙΣΧΥΤΙΚΗ ΜΑΘΗΣΗ	el
dc.subject.uncontrolledterm	ΠΟΛΥΠΡΑΚΤΟΡΙΚΗ ΕΝΙΣΧΥΤΙΚΗ ΜΑΘΗΣΗ	el
dc.subject.uncontrolledterm	ΕΞΕΛΙΞΗ ΚΑΝΟΝΩΝ ΜΑΘΗΣΗΣ	el
dc.subject.uncontrolledterm	ΔΥΝΑΜΙΚΑ ΠΕΡΙΒΑΛΛΟΝΤΑ	el
dc.subject.uncontrolledterm	ΔΙΑΚΟΠΤΕΣ ΝΕΥΡΩΝΕΣ	el
dc.subject.uncontrolledterm	REINFORCEMENT LEARNING	en
dc.subject.uncontrolledterm	NEURAL NETWORKS	en
dc.subject.uncontrolledterm	ADAPTIVE BEHAVIOR	en
dc.subject.uncontrolledterm	HIERARCHICAL REINFORCEMENT LEARNING	en
dc.subject.uncontrolledterm	MULTIAGENT REINFORCEMENT LEARNING	en
dc.subject.uncontrolledterm	EVOLUTION OF LEARNING RULES	en
dc.subject.uncontrolledterm	DYNAMIC ENVIRONMENTS	en
dc.subject.uncontrolledterm	SWITCH NEURONS	en
dc.identifier.lc	QA76.87.V37 2015	en
dc.author.faculty	Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtype	Doctoral Thesis	en
dc.rights.embargodate	2015-08-27
dc.contributor.orcid	Christodoulou, Chris [0000-0001-9398-5256]

Files in this item

Name:: VassilisVassiliadesPhD_final.pdf
Size:: 4.129Mb
Format:: PDF
Description:: Διδακτορική διατριβή

View/Open

Name:: Βασιλειάδης Βασίλης Κ. - ΠΛΗ - ...
Size:: 186.1Kb
Format:: PDF
Description:: Έντυπο έγκρισης ηλεκτρονικής ...

View/Open

This item appears in the following Collection(s)

Τμήμα Πληροφορικής / Department of Computer Science [78]

Show simple item record