Improving the performance of single and multi-application workloads on heterogeneous clustered many-core platforms

Petrides, Panayiotis P.

dc.contributor.advisor	Trancoso, Pedro	en
dc.contributor.author	Petrides, Panayiotis P.	en
dc.coverage.spatial	Κύπρος	el
dc.coverage.spatial	Cyprus	en
dc.creator	Petrides, Panayiotis P.	en
dc.date.accessioned	2020-04-07T09:04:40Z
dc.date.available	2020-04-07T09:04:40Z
dc.date.issued	2018-05
dc.date.submitted	2018-05-21
dc.identifier.uri	http://gnosis.library.ucy.ac.cy/handle/7/61653	en
dc.description	Includes bibliography (p. 103-112).	en
dc.description	Number of sources in the bibliography: 111	en
dc.description	Thesis (Ph. D.) -- University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science, 2018.	en
dc.description	The University of Cyprus Library holds the printed form of the thesis	en
dc.description.abstract	Τα τελευταία χρόνια οι αρχιτεκτονικές επεξεργαστών έχουν αναπτυχθεί προς την κατεύθυνση των πολλαπλών πυρήνων, με αποτέλεσμα την βελτίωση της επίδοσης τους αποφεύγοντας ταυτόχρονα τους περιορισμούς από την κατανάλωση ενέργειας. Ο αυξανόμενος αριθμός πυρήνων σε ένα ολοκληρωμένο κύκλωμα δεν προσφέρει μόνο τα πλεονεκτήματα της δυνητικής μαζικής παραλληλίας αλλά ταυτόχρονα δίνει την δυνατότητα στους κατασκευαστές να εξερευνήσουν νέες αρχιτεκτονικές, όπως την ενσωμάτωση στο ίδιο ολοκληρωμένο κύκλωμα πυρήνων διαφορετικών χαρακτηριστικών. Τα οφέλη αυτών των αρχιτεκτονικών συνοδεύονται όμως και με προκλήσεις. Ο αυξανόμενος αριθμός πυρήνων σε μια πολυπύρηνη αρχιτεκτονική μπορεί να τύχει εκμετάλλευσης από εφαρμογές με υψηλό βαθμό παραλληλίας. Η μεταφορά μιας εφαρμογής σε αυτού του είδους τις αρχιτεκτονικές δεν είναι μια απλή διαδικασία αλλά μια ευρύτερη εργασία που λαμβάνει υπόψιν τόσο την αρχιτεκτονική του συστήματος όσο και τα χαρακτηριστικά της εφαρμογής. Ως μελέτη περίπτωσης (case study), χρησιμοποιήθηκαν εφαρμογές συστημάτων υποβοήθησης λήψης αποφάσεων (Decision Support System), οι οποίες μεταφέρθηκαν σε αρχιτεκτονική πολλαπλών πυρήνων χρησιμοποιώντας την κοινή ενσωματωμένη στο κύκλωμα μνήμη (on-chip shared memory) για την προεπεξεργασία δεδομένων (prefetching buffer). Τα αποτελέσματα δείχνουν ότι όταν οι αιτήσεις για δεδομένα αντιμετωπίζονται ικανοποιητικά τότε γίνεται εκμεταλλεύσιμη και η παραλληλία των εφαρμογών. Ενώ κάποιες εφαρμογές επωφελούνται από τον αυξανόμενο αριθμό παράλληλων πυρήνων, σε πολλές περιπτώσεις η χρήση πολυπύρηνων επεξεργαστών στοχεύει στην παράλληλη εκτέλεση πολλαπλών εφαρμογών. Αυτό μπορεί να οδηγήσει σε παρεμβολές μεταξύ των υπό εκτέλεση εφαρμογών. Για την αντιμετώπιση αυτής της πρόκλησης, προτάθηκε μια απλή και μη παρεμβατική προσέγγιση χρησιμοποιώντας τεχνολογία εικονικοποίησης (virtualization techniques) στον ίδιο επεξεργαστή. Οι διαφορετικές εικονικές μηχανές μπορούν να θεωρηθούν ως Τομείς Επίδοσης (Performance Domains) προσφέροντας προβλεψιμότητα επίδοσης για τις διάφορες εφαρμογές. Τα πειραματικά αποτελέσματα δείχνουν ότι επιτυγχάνεται απομόνωση της εκτέλεσης των εφαρμογών σε ένα εικονικοποιημένο περιβάλλον και ταυτόχρονα μειώνονται οι παρεμβολές μεταξύ των εφαρμογών. Οι μελλοντικοί επεξεργαστές πολλαπλών πυρήνων μεγάλης κλίμακας αναμένεται να είναι μια συλλογή συμπλεγμάτων ετερογενών πυρήνων για να ικανοποιήσουν τις απαιτήσεις των εφαρμογών. Προκειμένου να ικανοποιηθεί η δυναμική συμπεριφορά των εφαρμογών, προτείνεται ένα σύστημα χρόνου εκτέλεσης (run time system) το οποίο είναι υπεύθυνο για την εύρεση ενός καλύτερου πόρου που ταιριάζει σε μια εφαρμογή σε κάθε διαφορετική φάση της εκτέλεσής τους. Ο προτεινόμενος ετερογενής χρονοπρογραμματιστής (scheduler) αξιολογήθηκε τόσο σε πραγματική αρχιτεκτονική πολλαπλών πυρήνων (Intel SCC 48 πυρήνων) όσο και με τη χρήση προσομοιωτή (Sniper) για εφαρμογές από τη σουίτα αναφοράς SPEC CPU2006. Τα αποτελέσματα δείχνουν ότι η μεταφορά εφαρμογών σε πυρήνες που ταιριάζουν καλύτερα στις απαιτήσεις τους, οδηγούν σε μείωση του χρόνου εκτέλεσης τους μεταξύ 15% και 36% σε σύγκριση με τυχαίο στατικό χρονοπρογραμματισμό. Δεδομένης της αυξανόμενης πολυπλοκότητας και πολυμορφίας των πυρήνων του επεξεργαστή, καθώς και των απαιτήσεων της εφαρμογής, θα χρειαστεί η ανάπτυξη περισσότερων από τις προαναφερόμενες τεχνικές για την αντιμετώπιση των προκλήσεων. Επομένως, οι μελλοντικοί ετερογενείς επεξεργαστές πολλαπλών πυρήνων θα πρέπει να περιλαμβάνουν ένα στρώμα εικονικοποίησης το οποίο θα μπορούσε να αποτελείται από όλες τις προτεινόμενες τεχνικές αλλά και άλλες με ένα αρθρωτό τρόπο ούτως ώστε να υποστηρίζει και πυρήνες που αλλάζουν δυναμικά τα χαρακτηριστικά τους.	el
dc.description.abstract	In recent years processor architectures have evolved towards chips with multiple cores, thus delivering the expected performance while avoiding the power wall. Increasing the number of devices on a chip will not only offer the benefit of increasing the potential for parallelism but will also allow manufacturers to explore new designs such as including in the same chip cores of different characteristics. The benefits will come also with challenges in exploiting the performance both for single and multi-application workloads. The increasing number of cores on a clustered many-core architecture can be exploited by applications with high degree of parallelism. Porting an application for such architectures is not trivial but a joint task of considering both the underlying architecture and the applications’ behaviour. Memory-bound applications with high degree of parallelism can create an increasing number of memory requests, which must be satisfied without becoming a performance bottleneck. As a case study, Decision Support System (DSS) workloads was ported to a clustered many-core architecture and the on-chip memory was used as a prefetching buffer. Results show that parallelism can be well exploited when the memory requests are well handled. While some applications benefit from the increasing number of parallel cores, in many cases the use of many-core processors will be for the co-execution of multiple applications. This might happen because of the limited degree of parallelism of the applications or to achieve higher throughput and resource utilization. Nevertheless, this can lead to application interference. To address this, a simple and non-intrusive approach using virtualization on the same processor was proposed. The different Virtual Machines can be seen as Performance Domains since the isolation offers performance predictability for the different applications. Results show that the performance overhead of executing on a virtualized environment is not significant. While Performance Domains provide isolation, they are static containers that do not adapt to the dynamic behaviour of applications. Future large-scale many-core processors are expected to be organized as a collection of NUMA clusters of heterogeneous cores to satisfy applications demands. To satisfy the applications’ dynamic behaviour, a runtime, is proposed. This system is responsible for finding a best matching resource for an application at a certain execution phase. The proposed heterogeneous and NUMA-aware scheduler was evaluated both on a real many-core architecture (48-core Intel SCC) and using a simulator (Sniper) for applications from the SPEC CPU2006 benchmark suite. The results indicate that even when all cores are busy, migrating processes to a better matching resource results in a reduction of the execution time between 15% and 36% compared to a random static scheduling. Given the increasingly complexity and diversity in the hardware resources, as well as the application demands, more of the above-mentioned techniques should be developed to address the challenges. Therefore, the vision is for future heterogeneous many-core processors is to include a virtualization layer which could be composed of all of the proposed techniques and others in a modular way and thus also be able to even support hardware that changes dynamically at runtime.	en
dc.format.extent	xvi, 112 p. : col. ill., tables, diagrs., graphs ; 31 cm.	en
dc.language.iso	eng	en
dc.publisher	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences
dc.rights	info:eu-repo/semantics/openAccess	en
dc.rights	Open Access	en
dc.subject.lcsh	Computer architecture	en
dc.subject.lcsh	High performance computing	en
dc.subject.lcsh	Heterogeneous computing	en
dc.subject.lcsh	Systems on a chip -- Design and construction	en
dc.subject.lcsh	Embedded computer systems	en
dc.title	Improving the performance of single and multi-application workloads on heterogeneous clustered many-core platforms	en
dc.title.alternative	Βελτιστοποίηση της επίδοσης εκτέλεσης μεμονωμένων και πολλαπλά εκτελέσιμων εφαρμογών σε πλατφόρμες πολλαπλών πυρήνων	el
dc.type	info:eu-repo/semantics/doctoralThesis	en
dc.contributor.committeemember	Παττίχης, Κωνσταντίνος	el
dc.contributor.committeemember	Ευριπίδου, Παρασκευάς	el
dc.contributor.committeemember	Γκιζόπουλος, Δημήτρης	el
dc.contributor.committeemember	Trancoso, Pedro	en
dc.contributor.committeemember	Pattichis, Constantinos	en
dc.contributor.committeemember	Euripidou, Paraskevas	en
dc.contributor.committeemember	Joao, Cardoso	en
dc.contributor.committeemember	Gizopoulos, Dimitris	en
dc.contributor.department	Πανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών, Τμήμα Πληροφορικής	el
dc.contributor.department	University of Cyprus, Faculty of Pure and Applied Sciences, Department of Computer Science	en
dc.subject.uncontrolledterm	ΕΠΕΞΕΡΓΑΣΤΕΣ ΠΟΛΛΑΠΛΩΝ ΠΥΡΗΝΩΝ	el
dc.subject.uncontrolledterm	ΧΡΟΝΟΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ ΕΦΑΜΡΟΓΩΝ	el
dc.subject.uncontrolledterm	ΕΠΙΔΟΣΗ ΕΠΕΞΕΡΓΑΣΤΩΝ	el
dc.subject.uncontrolledterm	ΠΡΟΕΠΕΞΕΡΓΑΣΙΑ ΔΕΔΟΜΕΝΩΝ	el
dc.subject.uncontrolledterm	ΑΠΟΜΟΝΩΣΗ ΕΠΙΔΟΣΗΣ	el
dc.subject.uncontrolledterm	ΕΤΕΡΟΓΕΝΕΙΣ ΕΠΕΞΕΡΓΑΣΤΕΣ	el
dc.subject.uncontrolledterm	MANY-CORE ARCHITECTURES	en
dc.subject.uncontrolledterm	NUMA-AWARE SCHEDULING	en
dc.subject.uncontrolledterm	MANY-CORE ARCHITECTURES PERFORMANCE	en
dc.subject.uncontrolledterm	DATA PREFETCHING	en
dc.subject.uncontrolledterm	PERFORMANCE ISOLATION	en
dc.subject.uncontrolledterm	HETEROGENEOUS MANY-CORE ARCHITECTURES	en
dc.identifier.lc	QA76.88.P48 2018	en
dc.author.faculty	Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied Sciences
dc.author.department	Τμήμα Πληροφορικής / Department of Computer Science
dc.type.uhtype	Doctoral Thesis	en
dc.rights.embargodate	2018-05-21
dc.contributor.orcid	Trancoso, Pedro [0000-0002-2776-9253]

Files in this item

Name:: Panayiotis_P_Petrides_PhD.pdf
Size:: 3.555Mb
Format:: PDF
Description:: Doctoral Thesis

View/Open

This item appears in the following Collection(s)

Τμήμα Πληροφορικής / Department of Computer Science [74]

Show simple item record