Show simple item record

dc.contributor.authorVassiliades, Vassilisen
dc.contributor.authorCleanthous, A.en
dc.contributor.authorChristodoulou, Chris C.en
dc.creatorVassiliades, Vassilisen
dc.creatorCleanthous, A.en
dc.creatorChristodoulou, Chris C.en
dc.date.accessioned2019-11-13T10:42:57Z
dc.date.available2019-11-13T10:42:57Z
dc.date.issued2011
dc.identifier.issn1045-9227
dc.identifier.urihttp://gnosis.library.ucy.ac.cy/handle/7/55134
dc.description.abstractThis paper investigates multiagent reinforcement learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. The spiking agents are neural networks with leaky integrate-and-fire neurons trained with two different learning algorithms: 1) reinforcement of stochastic synaptic transmission, or 2) reward-modulated spike-timing-dependent plasticity with eligibility trace. The nonspiking agents use a tabular representation and are trained with Q- and SARSA learning algorithms, with a novel reward transformation process also being applied to the Q-learning agents. According to the results, the cooperative outcome is enhanced by: 1) transformed internal reinforcement signals and a combination of a high learning rate and a low discount factor with an appropriate exploration schedule in the case of non-spiking agents, and 2) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and nonspiking agents have similar behavior and therefore they can equally well be used in a multiagent interaction setting. For training the spiking agents in the case where more than one output neuron competes for reinforcement, a novel and necessary modification that enhances competition is applied to the two learning algorithms utilized, in order to avoid a possible synaptic saturation. This is done by administering to the networks additional global reinforcement signals for every spike of the output neurons that were not responsible for the preceding decision. © 2011 IEEE.en
dc.sourceIEEE Transactions on Neural Networksen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-79953817126&doi=10.1109%2fTNN.2011.2111384&partnerID=40&md5=0fa39d5a51f85821c0d63a099d50cdd2
dc.subjectarticleen
dc.subjectMulti agent systemsen
dc.subjectLearning algorithmsen
dc.subjectNeural networksen
dc.subjecthumanen
dc.subjectHumansen
dc.subjectbiological modelen
dc.subjectAnimalsen
dc.subjectanimalen
dc.subjectphysiologyen
dc.subjectNeuronsen
dc.subjectartificial neural networken
dc.subjectNeural Networks (Computer)en
dc.subjectIntelligent agentsen
dc.subjectTime constantsen
dc.subjectReinforcementen
dc.subjectnerve cellen
dc.subjectFertilizersen
dc.subjectReinforcement learningen
dc.subjectAction Potentialsen
dc.subjectModels, Neurologicalen
dc.subjectReinforcement (Psychology)en
dc.subjectaction potentialen
dc.subjectMultiagent reinforcement learningen
dc.subjectMulti-agent reinforcement learningen
dc.subjectIterated prisoner's dilemmaen
dc.subjectDiscount factorsen
dc.subjectEligibility tracesen
dc.subjectIntegrate-and-fire neuronsen
dc.subjectLearning ratesen
dc.subjectMulti-agent interactionen
dc.subjectOutput neuronsen
dc.subjectPrisoner's Dilemmaen
dc.subjectQ-learning agentsen
dc.subjectReinforcement signalen
dc.subjectreward transformationen
dc.subjectSpike-timing-dependent plasticityen
dc.subjectspiking neural networksen
dc.subjectSynaptic transmissionen
dc.subjectTransformation processen
dc.titleMultiagent reinforcement learning: Spiking and nonspiking agents in the Iterated Prisoner's Dilemmaen
dc.typeinfo:eu-repo/semantics/article
dc.identifier.doi10.1109/TNN.2011.2111384
dc.description.volume22
dc.description.issue4
dc.description.startingpage639
dc.description.endingpage653
dc.author.faculty002 Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.departmentΤμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtypeArticleen
dc.description.notes<p>Cited By :7</p>en
dc.source.abbreviationIEEE Trans.Neural Networksen
dc.contributor.orcidChristodoulou, Chris C. [0000-0001-9398-5256]
dc.gnosis.orcid0000-0001-9398-5256


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record