Exploring HPC parallelism with data-driven multithreating

Christofides, Constantinos; Michael, G.; Trancoso, Pedro; Evripidou, Paraskevas

doi:10.1109/DFM.2012.11

Conference Object

Date

2013

Author

Christofides, Constantinos
Michael, G.

Trancoso, Pedro

Evripidou, Paraskevas

Publisher

IEEE Computer Society

Source

Proceedings - 2012 2nd Workshop on Data-Flow Execution Models for Extreme Scale Computing, DFM 2012
2012 2nd Workshop on Data-Flow Execution Models for Extreme Scale Computing, DFM 2012

Pages

10-17

Google Scholar check

Keyword(s):

Linear algebra

Semantics

Parallel processing systems

Digital storage

Scalability

Microprocessor chips

Cholesky decomposition

Data-driven multithreading

Multi-core systems

Exponential growth

High performance computing (HPC)

Linear algebra libraries

MAtrix multiplication

Scalability and performance

Metadata

Show full item record

Abstract

The switch to Multi-core systems has ended the reliance on the single processor for increase in performance and moved into Parallelism. However, the exponential growth in performance of the single processor in the 80's and 90's had overshadowed the drive for efficient Parallelism and relegate it into a niche research area, mostly for High Performance Computing (HPC). Parallelism now is in the forefront and holds the burden for utilising the extra resources of Moore's law to maintain the exponential growth of the computing systems. In the drive to utilise parallel models of computation, Data-Flow models have recently been "re-visited" for exploiting parallelism in the multi and many core systems. Data-Driven Multithreading (DDM) is one such model which is based on Dynamic Data- Flow principles, that can expose the maximum parallelism of an application. DDM schedules Threads based on Data availability driven by a producer consumer graph. DDM enforces single assignments semantics on the data passed from producer to consumer. In this paper we present a preliminary evaluation of whether DDM can be viable candidate for HPC.We study the scalability of a small subset of the LINPACK benchmark using the Data-Driven Multithreading for a system with a 48 cores. We implement three test case operations: Matrix Multiplication, LU and Cholesky decompositions and use them to test their scalability and performance. We use optimized linear algebra kernel operation for the basic operations performed in the threads. We compare our DDM implementations against PLASMA, a state-of-theart linear algebra library for HPC computing, and show that applications using the DDM model can scale efficiently and observe a performance improvement of up to 2×. © 2013 IEEE.