dc.contributor.author | Vassiliades, Vassilis | en |
dc.contributor.author | Cleanthous, A. | en |
dc.contributor.author | Christodoulou, Chris C. | en |
dc.creator | Vassiliades, Vassilis | en |
dc.creator | Cleanthous, A. | en |
dc.creator | Christodoulou, Chris C. | en |
dc.date.accessioned | 2019-11-13T10:42:57Z | |
dc.date.available | 2019-11-13T10:42:57Z | |
dc.date.issued | 2011 | |
dc.identifier.issn | 1045-9227 | |
dc.identifier.uri | http://gnosis.library.ucy.ac.cy/handle/7/55134 | |
dc.description.abstract | This paper investigates multiagent reinforcement learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. The spiking agents are neural networks with leaky integrate-and-fire neurons trained with two different learning algorithms: 1) reinforcement of stochastic synaptic transmission, or 2) reward-modulated spike-timing-dependent plasticity with eligibility trace. The nonspiking agents use a tabular representation and are trained with Q- and SARSA learning algorithms, with a novel reward transformation process also being applied to the Q-learning agents. According to the results, the cooperative outcome is enhanced by: 1) transformed internal reinforcement signals and a combination of a high learning rate and a low discount factor with an appropriate exploration schedule in the case of non-spiking agents, and 2) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and nonspiking agents have similar behavior and therefore they can equally well be used in a multiagent interaction setting. For training the spiking agents in the case where more than one output neuron competes for reinforcement, a novel and necessary modification that enhances competition is applied to the two learning algorithms utilized, in order to avoid a possible synaptic saturation. This is done by administering to the networks additional global reinforcement signals for every spike of the output neurons that were not responsible for the preceding decision. © 2011 IEEE. | en |
dc.source | IEEE Transactions on Neural Networks | en |
dc.source.uri | https://www.scopus.com/inward/record.uri?eid=2-s2.0-79953817126&doi=10.1109%2fTNN.2011.2111384&partnerID=40&md5=0fa39d5a51f85821c0d63a099d50cdd2 | |
dc.subject | article | en |
dc.subject | Multi agent systems | en |
dc.subject | Learning algorithms | en |
dc.subject | Neural networks | en |
dc.subject | human | en |
dc.subject | Humans | en |
dc.subject | biological model | en |
dc.subject | Animals | en |
dc.subject | animal | en |
dc.subject | physiology | en |
dc.subject | Neurons | en |
dc.subject | artificial neural network | en |
dc.subject | Neural Networks (Computer) | en |
dc.subject | Intelligent agents | en |
dc.subject | Time constants | en |
dc.subject | Reinforcement | en |
dc.subject | nerve cell | en |
dc.subject | Fertilizers | en |
dc.subject | Reinforcement learning | en |
dc.subject | Action Potentials | en |
dc.subject | Models, Neurological | en |
dc.subject | Reinforcement (Psychology) | en |
dc.subject | action potential | en |
dc.subject | Multiagent reinforcement learning | en |
dc.subject | Multi-agent reinforcement learning | en |
dc.subject | Iterated prisoner's dilemma | en |
dc.subject | Discount factors | en |
dc.subject | Eligibility traces | en |
dc.subject | Integrate-and-fire neurons | en |
dc.subject | Learning rates | en |
dc.subject | Multi-agent interaction | en |
dc.subject | Output neurons | en |
dc.subject | Prisoner's Dilemma | en |
dc.subject | Q-learning agents | en |
dc.subject | Reinforcement signal | en |
dc.subject | reward transformation | en |
dc.subject | Spike-timing-dependent plasticity | en |
dc.subject | spiking neural networks | en |
dc.subject | Synaptic transmission | en |
dc.subject | Transformation process | en |
dc.title | Multiagent reinforcement learning: Spiking and nonspiking agents in the Iterated Prisoner's Dilemma | en |
dc.type | info:eu-repo/semantics/article | |
dc.identifier.doi | 10.1109/TNN.2011.2111384 | |
dc.description.volume | 22 | |
dc.description.issue | 4 | |
dc.description.startingpage | 639 | |
dc.description.endingpage | 653 | |
dc.author.faculty | 002 Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences | |
dc.author.department | Τμήμα Πληροφορικής / Department of Computer Science | |
dc.type.uhtype | Article | en |
dc.description.notes | <p>Cited By :7</p> | en |
dc.source.abbreviation | IEEE Trans.Neural Networks | en |
dc.contributor.orcid | Christodoulou, Chris C. [0000-0001-9398-5256] | |
dc.gnosis.orcid | 0000-0001-9398-5256 | |