Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting

Christodoulou, Chris C.; Cleanthous, A.

doi:10.4077/CJP.2010.AMM030

dc.contributor.author	Christodoulou, Chris C.	en
dc.contributor.author	Cleanthous, A.	en
dc.creator	Christodoulou, Chris C.	en
dc.creator	Cleanthous, A.	en
dc.date.accessioned	2019-11-13T10:39:16Z
dc.date.available	2019-11-13T10:39:16Z
dc.date.issued	2010
dc.identifier.uri	http://gnosis.library.ucy.ac.cy/handle/7/53713
dc.description.abstract	This paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through rewardmodulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two "selfish" agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of rewarddriven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism.© 2010 by The Chinese Physiological Society and Airiti Press Inc.	en
dc.source	Chinese Journal of Physiology	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-79955386615&doi=10.4077%2fCJP.2010.AMM030&partnerID=40&md5=0e6f70558c4a47ab631a4db829cf7fa5
dc.subject	Learning	en
dc.subject	Computer Simulation	en
dc.subject	article	en
dc.subject	mathematical model	en
dc.subject	Humans	en
dc.subject	controlled study	en
dc.subject	Animals	en
dc.subject	Neural Networks (Computer)	en
dc.subject	learning algorithm	en
dc.subject	nerve cell network	en
dc.subject	Stochastic Processes	en
dc.subject	synaptic transmission	en
dc.subject	Action Potentials	en
dc.subject	Game Theory	en
dc.subject	Models, Neurological	en
dc.subject	reinforcement	en
dc.subject	Reinforcement (Psychology)	en
dc.subject	Spiking neural networks	en
dc.subject	Multiagent reinforcement learning	en
dc.subject	nerve cell plasticity	en
dc.subject	Neuronal Plasticity	en
dc.subject	Reward-modulated spike timing-dependent plasticity	en
dc.title	Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting	en
dc.type	info:eu-repo/semantics/article
dc.identifier.doi	10.4077/CJP.2010.AMM030
dc.description.volume	53
dc.description.issue	6
dc.description.startingpage	447
dc.description.endingpage	453
dc.author.faculty	002 Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtype	Article	en
dc.source.abbreviation	Chin.J.Physiol.	en
dc.contributor.orcid	Christodoulou, Chris C. [0000-0001-9398-5256]
dc.gnosis.orcid	0000-0001-9398-5256

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Τμήμα Πληροφορικής / Department of Computer Science [1952]

Show simple item record