Multiagent reinforcement learning: Spiking and nonspiking agents in the Iterated Prisoner's Dilemma

Vassiliades, Vassilis; Cleanthous, A.; Christodoulou, Chris C.

doi:10.1109/TNN.2011.2111384

dc.contributor.author	Vassiliades, Vassilis	en
dc.contributor.author	Cleanthous, A.	en
dc.contributor.author	Christodoulou, Chris C.	en
dc.creator	Vassiliades, Vassilis	en
dc.creator	Cleanthous, A.	en
dc.creator	Christodoulou, Chris C.	en
dc.date.accessioned	2019-11-13T10:42:57Z
dc.date.available	2019-11-13T10:42:57Z
dc.date.issued	2011
dc.identifier.issn	1045-9227
dc.identifier.uri	http://gnosis.library.ucy.ac.cy/handle/7/55134
dc.description.abstract	This paper investigates multiagent reinforcement learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. The spiking agents are neural networks with leaky integrate-and-fire neurons trained with two different learning algorithms: 1) reinforcement of stochastic synaptic transmission, or 2) reward-modulated spike-timing-dependent plasticity with eligibility trace. The nonspiking agents use a tabular representation and are trained with Q- and SARSA learning algorithms, with a novel reward transformation process also being applied to the Q-learning agents. According to the results, the cooperative outcome is enhanced by: 1) transformed internal reinforcement signals and a combination of a high learning rate and a low discount factor with an appropriate exploration schedule in the case of non-spiking agents, and 2) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and nonspiking agents have similar behavior and therefore they can equally well be used in a multiagent interaction setting. For training the spiking agents in the case where more than one output neuron competes for reinforcement, a novel and necessary modification that enhances competition is applied to the two learning algorithms utilized, in order to avoid a possible synaptic saturation. This is done by administering to the networks additional global reinforcement signals for every spike of the output neurons that were not responsible for the preceding decision. © 2011 IEEE.	en
dc.source	IEEE Transactions on Neural Networks	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-79953817126&doi=10.1109%2fTNN.2011.2111384&partnerID=40&md5=0fa39d5a51f85821c0d63a099d50cdd2
dc.subject	article	en
dc.subject	Multi agent systems	en
dc.subject	Learning algorithms	en
dc.subject	Neural networks	en
dc.subject	human	en
dc.subject	Humans	en
dc.subject	biological model	en
dc.subject	Animals	en
dc.subject	animal	en
dc.subject	physiology	en
dc.subject	Neurons	en
dc.subject	artificial neural network	en
dc.subject	Neural Networks (Computer)	en
dc.subject	Intelligent agents	en
dc.subject	Time constants	en
dc.subject	Reinforcement	en
dc.subject	nerve cell	en
dc.subject	Fertilizers	en
dc.subject	Reinforcement learning	en
dc.subject	Action Potentials	en
dc.subject	Models, Neurological	en
dc.subject	Reinforcement (Psychology)	en
dc.subject	action potential	en
dc.subject	Multiagent reinforcement learning	en
dc.subject	Multi-agent reinforcement learning	en
dc.subject	Iterated prisoner's dilemma	en
dc.subject	Discount factors	en
dc.subject	Eligibility traces	en
dc.subject	Integrate-and-fire neurons	en
dc.subject	Learning rates	en
dc.subject	Multi-agent interaction	en
dc.subject	Output neurons	en
dc.subject	Prisoner's Dilemma	en
dc.subject	Q-learning agents	en
dc.subject	Reinforcement signal	en
dc.subject	reward transformation	en
dc.subject	Spike-timing-dependent plasticity	en
dc.subject	spiking neural networks	en
dc.subject	Synaptic transmission	en
dc.subject	Transformation process	en
dc.title	Multiagent reinforcement learning: Spiking and nonspiking agents in the Iterated Prisoner's Dilemma	en
dc.type	info:eu-repo/semantics/article
dc.identifier.doi	10.1109/TNN.2011.2111384
dc.description.volume	22
dc.description.issue	4
dc.description.startingpage	639
dc.description.endingpage	653
dc.author.faculty	002 Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtype	Article	en
dc.description.notes	<p>Cited By :7</p>	en
dc.source.abbreviation	IEEE Trans.Neural Networks	en
dc.contributor.orcid	Christodoulou, Chris C. [0000-0001-9398-5256]
dc.gnosis.orcid	0000-0001-9398-5256

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Τμήμα Πληροφορικής / Department of Computer Science [1952]

Show simple item record

Multiagent reinforcement learning: Spiking and nonspiking agents in the Iterated Prisoner's Dilemma

Files in this item

This item appears in the following Collection(s)

Related items

An extension of a hierarchical reinforcement learning algorithm for multiagent settings ﻿

Multiagent reinforcement learning with spiking and non-spiking agents in the iterated prisoner's dilemma ﻿

Multiagent reinforcement learning in the iterated prisoner's dilemma: Fast cooperation through evolved payoffs ﻿

An extension of a hierarchical reinforcement learning algorithm for multiagent settings

Multiagent reinforcement learning with spiking and non-spiking agents in the iterated prisoner's dilemma

Multiagent reinforcement learning in the iterated prisoner's dilemma: Fast cooperation through evolved payoffs