Architectural and software support for data-driven execution on multi-core processors

Matheou, George A.

dc.contributor.advisor	Euripidou, Paraskevas	en
dc.contributor.author	Matheou, George A.	en
dc.coverage.spatial	Κύπρος	el
dc.coverage.spatial	Cyprus	en
dc.creator	Matheou, George A.	en
dc.date.accessioned	2020-04-07T09:04:39Z
dc.date.available	2020-04-07T09:04:39Z
dc.date.issued	2018-01
dc.date.submitted	2018-01-05
dc.identifier.uri	http://gnosis.library.ucy.ac.cy/handle/7/61651	en
dc.description	Includes bibliography (p. 161-172).	en
dc.description	Number of sources in the bibliography: 192	en
dc.description	Thesis (Ph. D.) -- University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science, 2018.	en
dc.description	The University of Cyprus Library holds the printed form of the thesis	en
dc.description.abstract	Το τέλος της εκθετικής ανάπτυξης των σειριακών επεξεργαστών έχει διευκολύνει την ανάπτυξη των πολυπύρηνων συστημάτων. Έτσι, οποιαδήποτε αύξηση της απόδοσης πρέπει να προέρχεται από τον παραλληλισμό. Για να επιτευχθεί αυτό, πρέπει να αναπτυχθούν αποτελεσματικά μοντέλα παράλληλου προγραμματισμού/εκτέλεσης. Προτείνουμε την ανάπτυξη τέτοιων συστημάτων χρησιμοποιώντας το μοντέλο εκτέλεσης data-driven multithreading (ddm). Το ddm είναι ένα πολυνηματικό μοντέλο που συνδυάζει ταυτοχρονισμό, βασισμένο στο δυναμικό μοντέλο ροής δεδομένων, και αποδοτική διαδοχική εκτέλεση σε συμβατικούς επεξεργαστές. Το ddm χρησιμοποιεί το thread scheduling unit (tsu) για τη χρονοδρομολόγηση των νημάτων κατά τη διάρκεια εκτέλεσης, βάσει της διαθεσιμότητας των δεδομένων. Σε αυτό το έργο, παρέχουμε αρχιτεκτονική και λογισμική υποστήριξη για την αποτελεσματική εκτέλεση σε πολυπύρηνες αρχιτεκτονικές, μέσω δύο διαφορετικών υλοποιήσεων που βασίζονται στο μοντέλο ddm. Η πρώτη υλοποίηση πραγματοποιεί το μοντέλο ddm στο υλικό, χρησιμοποιώντας field programmable gate arrays (fpgas). Η υλοποίηση αυτή στοχεύει να βοηθήσει στην ανάπτυξη μελλοντικών πολυπύρηνων συστημάτων υψηλής απόδοσης και χαμηλής ισχύος. Το tsu έχει υλοποιηθεί σε υλικό χρησιμοποιώντας την γλώσσα προγραμματισμού verilog και έχει ενσωματωθεί σε ένα πολυπύρηνο επεξεργαστή με μη-συνεκτικούς και χαμηλής πολυπλοκότητας πυρήνες. Ο επεξεργαστής αυτός ονομάζεται midas (multi-core with data-driven architectural support) και έχει παραχθεί σε πρωτότυπο σε ένα xilinx virtex-6 fpga. Ο midas επεξεργαστής έχει αξιολογηθεί χρησιμοποιώντας εφαρμογές με διαφορετικά χαρακτηριστικά οι οποίες αναπτύχθηκαν σε c/c++ χρησιμοποιώντας μια διεπαφή προγραμματισμού εφαρμογών (api). Η αξιολόγηση της απόδοσης του midas έδειξε ότι η αρχιτεκτονική υποστήριξη για την εκτέλεση ροής δεδομένων μπορεί να επιτύχει πολύ καλά αποτελέσματα, ακόμη και σε εφαρμογές με πολύ μικρά μεγέθη προβλημάτων. Παρέχουμε αρκετά αποτελέσματα για το υλικό tsu και τον midas επεξεργαστή, όπως για παράδειγμα τη χρήση των πόρων του fpga, εκτιμήσεις για την κατανάλωση ενέργειας και καθυστερήσεις (σε κύκλους) των διαφόρων λειτουργιών του tsu. Τα αποτελέσματα δείχνουν ότι το tsu μπορεί να υλοποιηθεί με μικρό προϋπολογισμό υλικού. Το tsu συγκρίνεται με το task superscalar, μια αρχιτεκτονική που υλοποιεί το μοντέλο starss σε υλικό, χρησιμοποιώντας απαιτήσεις σε πόρους και μακρο-στατιστικές. Τα αποτελέσματα δείχνουν ότι η υλοποίηση ενός μοντέλου ροής δεδομένων στο υλικό, που ανιχνεύει δυναμικά εξαρτήσεις μεταξύ εργασιών και κατασκευάζει το γράφημα εξαρτήσεων κατά τη διάρκεια εκτέλεσης, όπως το task superscalar, αυξάνει σημαντικά τη χρήση των πόρων (και κατά συνέπεια την κατανάλωση ενέργειας). Η δεύτερη υλοποίηση, ονομαζόμενη freddo (efficient framework for runtime execution of data-driven objects), είναι μια αποδοτική και φορητή αντικειμενοστραφής υλοποίηση του μοντέλου ddm, που επιτρέπει χρονοδρομολόγηση βασισμένη στο μοντέλο ροής δεδομένων σε κατανεμημένα συστήματα με συμβατικούς πολυπύρηνους επεξεργαστές. Το freddo στοχεύει στην αποδοτική ddm εκτέλεση σε κατανεμημένα συστήματα υπολογισμού υψηλών επιδόσεων. Παρέχει επίσης νέες δυνατότητες στο μοντέλο ddm όπως υποστήριξη αναδρομής και επεκτείνει τη διεπαφή προγραμματισμού του ddm με αντικειμενοστραφή προγραμματισμό. Το freddo έχει αξιολογηθεί σε δύο διαφορετικά συστήματα: ένα σύστημα 4-κόμβων amd με συνολικά 128 πυρήνες και ένα σύστημα 64-κόμβων intel με συνολικά 768 πυρήνες. Η αξιολόγηση της απόδοσης δείχνει ότι το προτεινόμενο σύστημα κλιμακώνεται καλά και ανέχεται αποτελεσματικά το κόστος χρονοδρομολόγησης και τις καθυστερήσεις μνήμης. Επίσης, συγκρίνουμε το freddo με τα συστήματα openmp, mpi, ddm-vm και ompss. Τα αποτελέσματα σύγκρισης δείχνουν ότι το προτεινόμενο σύστημα επιτυγχάνει συγκρίσιμες ή καλύτερες επιδόσεις.	el
dc.description.abstract	The end of the exponential growth of the sequential processors has facilitated the development of multi-core systems. Thus, any growth in performance must come from parallelism. To achieve that, efficient parallel programming/execution models must be developed. We propose to develop such systems using the data-driven multithreading (ddm) model of execution. Ddm is a non-blocking multithreading model that combines dynamic data-flow concurrency with efficient sequential execution on conventional processors. Ddm utilizes the thread scheduling unit (tsu) for scheduling threads at runtime, based on data availability. In this work, we provide architectural and software support for efficient data-driven execution on multi-core architectures, through two different ddm-based implementations. The first implementation realizes the ddm model in hardware, using field programmable gate arrays (fpgas). The hardware ddm implementation aims to help in the development of future high-performance and low-power multi-core systems. The ddm’s tsu has been implemented in hardware using verilog. The hardware tsu implementation has been integrated into a shared-memory multi-core processor with non-coherent in-order cores, called midas (multi-core with data-driven architectural support). Midas has been prototyped and evaluated on a xilinx virtex-6 fpga using benchmarks with different characteristics. The benchmarks were developed in c/c++ using a software api. The performance evaluation of midas has shown that the architectural support for data-driven execution can achieve very good results, even on benchmarks with very small problem sizes. We provide several results for hardware tsu and midas, including fpga resource utilization, power consumption estimations and latencies (in cycles) of various tsu operations. The results show that tsu can be implemented in hardware with a small hardware budget. The ddm’s tsu is compared with task superscalar, an architecture that implements the starss programming framework in hardware, using resource utilization and macro statistics. The results show that implementing a data-driven model in hardware, that dynamically detects inter-task dependencies and constructs the dependency graph at runtime, like task superscalar, significantly increases the resource utilization (and consequently the power consumption). The second implementation, called freddo (efficient framework for runtime execution of data-driven objects), is an efficient and portable object-oriented implementation of ddm that enables data-driven scheduling on conventional single-node and distributed multi-core systems. The freddo implementation aims to allow efficient ddm execution on distributed high performance computing (hpc) systems. It also provides new features to the ddm model like recursion support and it extends the ddm’s programming interface with the object-oriented programming paradigm. Freddo has been evaluated on two different systems: a 4-node amd system with a total of 128 cores and a 64-node intel hpc system with a total of 768 cores. The performance evaluation shows that the proposed framework scales well and tolerates scheduling overheads and memory latencies effectively. We also compare our framework with openmp, mpi, ddm-vm and ompss. The comparison results show that the proposed framework obtains comparable or better performance.	en
dc.format.extent	xv, 175 p. : col. ill., diagrs., tables ; 31 cm.	en
dc.language.iso	eng	en
dc.publisher	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences
dc.rights	info:eu-repo/semantics/openAccess	en
dc.rights	Open Access	en
dc.subject.lcsh	Parallel programming (Computer science)	en
dc.subject.lcsh	Data flow computing	en
dc.subject.lcsh	Computer architecture	en
dc.subject.lcsh	Software engineering	en
dc.subject.lcsh	Computer software	en
dc.subject.lcsh	Field programmable gate arrays	en
dc.subject.lcsh	Multiprocessors	en
dc.subject.lcsh	Electronic data processing -- Distributed processing	en
dc.title	Architectural and software support for data-driven execution on multi-core processors	en
dc.title.alternative	Αρχιτεκτονική και λογισμική υποστήριξη για εκτέλεση βασισμένη στο μοντέλο ροής δεδομένων σε πολυπύρηνους επεξεργαστές	el
dc.type	info:eu-repo/semantics/doctoralThesis	en
dc.contributor.committeemember	Παττίχης, Κωνσταντίνος	el
dc.contributor.committeemember	Θεοχαρίδης, Θεοχάρης	el
dc.contributor.committeemember	Pattichis, Constantinos	en
dc.contributor.committeemember	Theocharides, Theocharis	en
dc.contributor.committeemember	Watson, Ian	en
dc.contributor.committeemember	Cohen, Albert	en
dc.contributor.department	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών, Τμήμα Πληροφορικής	el
dc.contributor.department	University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science	en
dc.subject.uncontrolledterm	ΠΟΛΥΝΗΜΑΤΙΚΗ ΤΕΧΝΟΛΟΓΙΑ ΒΑΣΙΣΜΕΝΗ ΣΤΟ ΜΟΝΤΕΛΟ ΡΟΗΣ ΔΕΔΟΜΕΝΩΝ	el
dc.subject.uncontrolledterm	ΑΡΧΙΤΕΚΤΟΝΙΚΕΣ ΡΟΗΣ ΔΕΔΟΜΕΝΩΝ	el
dc.subject.uncontrolledterm	ΔΙΑΤΑΞΗ ΠΕΔΙΑΚΑ ΠΡΟΓΡΑΜΜΑΤΙΖΟΜΕΝΩΝ ΠΥΛΩΝ	el
dc.subject.uncontrolledterm	ΠΟΛΥΠΥΡΗΝΟΙ ΕΠΕΞΕΡΓΑΣΤΕΣ	el
dc.subject.uncontrolledterm	ΥΠΟΛΟΓΙΣΜΟΣ ΥΨΗΛΩΝ ΕΠΙΔΟΣΕΩΝ	el
dc.subject.uncontrolledterm	ΚΑΤΑΝΕΜΗΜΕΝΑ ΣΥΣΤΗΜΑΤΑ	el
dc.subject.uncontrolledterm	ΠΑΡΑΛΛΗΛΟΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ	el
dc.subject.uncontrolledterm	ΠΡΩΤΟΤΥΠΟΠΟΙΗΣΗ	el
dc.subject.uncontrolledterm	DATA-DRIVEN MULTITHREADING	en
dc.subject.uncontrolledterm	DATA-FLOW ARCHITECTURES	en
dc.subject.uncontrolledterm	FPGA	en
dc.subject.uncontrolledterm	MULTI-CORE PROCESSORS	en
dc.subject.uncontrolledterm	HIGH PERFORMANCE COMPUTING	en
dc.subject.uncontrolledterm	DISTRIBUTED SYSTEMS	en
dc.subject.uncontrolledterm	PARALLEL PROGRAMMING	en
dc.subject.uncontrolledterm	PROTOTYPING	en
dc.identifier.lc	QA76.9.A73M39 2017	en
dc.author.faculty	Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtype	Doctoral Thesis	en
dc.rights.embargodate	2018-01-05

Files in this item

Name:: George A. Matheou PhD.pdf
Size:: 9.527Mb
Format:: PDF
Description:: Doctoral Thesis

View/Open

This item appears in the following Collection(s)

Τμήμα Πληροφορικής / Department of Computer Science [74]

Show simple item record