IndexMAC: A Custom RISC-V Vector Instruction to Accelerate Structured-Sparse Matrix Multiplications

Titopoulos, Vasileios; Alexandridis, Kosmas; Peltekis, Christodoulos; Nicopoulos, Chrysostomos; Dimitrakopoulos, Giorgos

doi:10.48550/arXiv.2311.07241

Article

Date

2024-03

Author

Titopoulos, Vasileios
Alexandridis, Kosmas
Peltekis, Christodoulos

Nicopoulos, Chrysostomos

Dimitrakopoulos, Giorgos

Publisher

IEEE

Source

Design, Automation and Test in Europe Conference [DATE] 2024

Google Scholar check

Metadata

Show full item record

Abstract

Structured sparsity has been proposed as an efficient way to prune the complexity of modern Machine Learning (ML) applications and to simplify the handling of sparse data in hardware. The acceleration of ML models - for both training and inference - relies primarily on equivalent matrix multiplications that can be executed efficiently on vector processors or custom matrix engines. The goal of this work is to incorporate the simplicity of structured sparsity into vector execution, thereby accelerating the corresponding matrix multiplications. Toward this objective, a new vector index-multiply-accumulate instruction is proposed, which enables the implementation of lowcost indirect reads from the vector register file. This reduces unnecessary memory traffic and increases data locality. The proposed new instruction was integrated in a decoupled RISCV vector processor with negligible hardware cost. Extensive evaluation demonstrates significant speedups of 1.80x-2.14x, as compared to state-of-the-art vectorized kernels, when executing layers of varying sparsity from state-of-the-art Convolutional Neural Networks (CNNs).