Coordinated cooperative work using undependable processors with unreliable broadcast
Prisco, R. D.
Shvartsman, A. A.
PublisherIEEE Computer Society
SourceProceedings - 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2014
2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2014
Google Scholar check
MetadataShow full item record
With the end of Moore's Law in sight, parallelism became the main means for speeding up computationally intensive applications, especially in the cases where large collections of tasks need to be performed. Network supercomputing - taking advantage of very large numbers of computers in a distributed environment is an effective approach to massive parallelism that harnesses the processing power inherent in large networked settings. In such settings, processor failures are no longer an exception, but the norm. Any algorithm designed for realistic settings must be able to deal with failures. This paper presents a new message-passing algorithm for distributed cooperative work in synchronous settings where processors may crash, and where any broadcasts performed by crashing processors are unreliable. We specify the algorithm, prove that it is correct, and perform extensive simulations that show that its performance is close to similar algorithms that use reliable broadcast, and that its work compares favorably to the relevant lower bounds. © 2014 IEEE.