Show simple item record

dc.contributor.authorPeltekis, Christodoulosen
dc.contributor.authorFilippas, Dionysiosen
dc.contributor.authorDimitrakopoulos, Giorgosen
dc.contributor.authorNicopoulos, Chrysostomosen
dc.contributor.editorVillar, Eugenioen
dc.creatorPeltekis, Christodoulosen
dc.creatorFilippas, Dionysiosen
dc.creatorDimitrakopoulos, Giorgosen
dc.creatorNicopoulos, Chrysostomosen
dc.date.accessioned2023-12-19T16:22:45Z
dc.date.available2023-12-19T16:22:45Z
dc.date.issued2023-10-03
dc.identifier.issn1872-9436
dc.identifier.urihttp://gnosis.library.ucy.ac.cy/handle/7/65823en
dc.description.abstractSystolic Array (SA) architectures are well-suited for accelerating matrix multiplications through the use of a pipelined array of Processing Elements (PEs) communicating with local connections and pre-orchestrated data movements. Even though most of the dynamic power consumption in SAs is due to multiplications and additions, pipelined data movement within the SA constitutes an additional important contributor. The goal of this work is to reduce the dynamic power consumption associated with the feeding of data to the SA, by employing both dynamic (run-time) and static (offline) techniques. At the hardware level, the proposed architecture synergistically applies bus-invert coding and zero-value clock gating. By exploiting salient attributes of state-of-the-art CNNs, such as the value distribution of the weights, the proposed SA applies appropriate encoding only to the data that exhibits high switching activity. Similarly, when one of the inputs is zero, unnecessary operations are entirely skipped. In addition to this duet of run-time techniques, the proposed methodology also leverages the inherent property of the weight matrix to remain unchanged throughout the inference phase. As such, the weight matrix is appropriately reordered offline to minimize the switching activity between consecutive values, as the matrix is repeatedly loaded into the array. The weight reordering process is formulated as a Traveling Salesman Problem (TSP) and its solution is translated into a switching-activity-aware row permutation of the weight matrix. The symbiotic combination of selectively targeted, application-aware dynamic encoding and offline weight reordering is demonstrated to reduce the switching activity by 38%, on average. This translates to an overall dynamic power reduction of 17.1%–23% when executing state-of-the-art CNN layers on an SA of size 32 × 32. These power savings scale with the array size; for an array of size 64 × 64, the proposed design consumes 29.7%–35.4% less power.en
dc.language.isoengen
dc.publisherElsevieren
dc.sourceMicroprocessors and Microsystems: Embedded Hardware Design (MICPRO)en
dc.source.urihttps://doi.org/10.1016/j.micpro.2023.104938en
dc.source.urihttps://www.sciencedirect.com/science/article/pii/S0141933123001825en
dc.subjectSystolic arraysen
dc.subjectBus-invert codingen
dc.subjectZero-value clock gatingen
dc.subjectWeight reorderingen
dc.subjectTraveling salesman problemen
dc.subjectLow-power designen
dc.subjectMachine learning acceleratorsen
dc.titleExploiting data encoding and reordering for low-power streaming in systolic arraysen
dc.typeinfo:eu-repo/semantics/articleen
dc.identifier.doi10.1016/j.micpro.2023.104938
dc.description.volume102en
dc.author.faculty007 Πολυτεχνική Σχολή / Faculty of Engineering
dc.author.departmentΤμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών / Department of Electrical and Computer Engineering
dc.type.uhtypeArticleen
dc.contributor.orcidNicopoulos, Chrysostomos [0000-0001-6389-6068]
dc.contributor.orcidFilippas, Dionysios [0000-0002-4729-3336]
dc.contributor.orcidDimitrakopoulos, Giorgos [0000-0003-3688-7865]
dc.type.subtypeSCIENTIFIC_JOURNALen
dc.gnosis.orcid0000-0001-6389-6068
dc.gnosis.orcid0000-0002-4729-3336
dc.gnosis.orcid0000-0003-3688-7865


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record