Show simple item record

dc.contributor.authorChristodoulou, Chris C.en
dc.contributor.authorCleanthous, A.en
dc.creatorChristodoulou, Chris C.en
dc.creatorCleanthous, A.en
dc.date.accessioned2019-11-13T10:39:16Z
dc.date.available2019-11-13T10:39:16Z
dc.date.issued2010
dc.identifier.urihttp://gnosis.library.ucy.ac.cy/handle/7/53713
dc.description.abstractThis paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through rewardmodulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two "selfish" agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of rewarddriven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism.© 2010 by The Chinese Physiological Society and Airiti Press Inc.en
dc.sourceChinese Journal of Physiologyen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-79955386615&doi=10.4077%2fCJP.2010.AMM030&partnerID=40&md5=0e6f70558c4a47ab631a4db829cf7fa5
dc.subjectLearningen
dc.subjectComputer Simulationen
dc.subjectarticleen
dc.subjectmathematical modelen
dc.subjectHumansen
dc.subjectcontrolled studyen
dc.subjectAnimalsen
dc.subjectNeural Networks (Computer)en
dc.subjectlearning algorithmen
dc.subjectnerve cell networken
dc.subjectStochastic Processesen
dc.subjectsynaptic transmissionen
dc.subjectAction Potentialsen
dc.subjectGame Theoryen
dc.subjectModels, Neurologicalen
dc.subjectreinforcementen
dc.subjectReinforcement (Psychology)en
dc.subjectSpiking neural networksen
dc.subjectMultiagent reinforcement learningen
dc.subjectnerve cell plasticityen
dc.subjectNeuronal Plasticityen
dc.subjectReward-modulated spike timing-dependent plasticityen
dc.titleSpiking neural networks with different reinforcement learning (RL) schemes in a multiagent settingen
dc.typeinfo:eu-repo/semantics/article
dc.identifier.doi10.4077/CJP.2010.AMM030
dc.description.volume53
dc.description.issue6
dc.description.startingpage447
dc.description.endingpage453
dc.author.faculty002 Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.departmentΤμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtypeArticleen
dc.source.abbreviationChin.J.Physiol.en
dc.contributor.orcidChristodoulou, Chris C. [0000-0001-9398-5256]
dc.gnosis.orcid0000-0001-9398-5256


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record