Fault-tolerant computation in groups and semigroups: Applications to automata, dynamic systems and Petri nets
AuthorHadjicostis, Christoforos N.
Verghese, G. C.
SourceJournal of the Franklin Institute
Google Scholar check
MetadataShow full item record
The traditional approach to fault-tolerant computation has been via modular hardware redundancy. Although universal and simple, modular redundancy is inherently expensive and inefficient. By exploiting particular structural features of a computational architecture or an algorithm, arithmetic codes and recently developed algorithm-based fault tolerance (ABFT) techniques manage to introduce "analytical redundancy" and offer more efficient fault coverage at the cost of narrower applicability and harder design. In this paper, we extend a variety of results and constructive procedures that were developed in previous work for computations that take place in an abelian group to a more general setting that considers computations in semigroups. We demonstrate possible encodings for semigroup operations of interest and use our extension to design concurrent error detection and correction schemes for group and semigroup machines. The method provides insight regarding the role of decomposition in fault-tolerant algebraic machines and results in a general, hardware-independent characterization of concurrent error detection and correction in finite semiautomata. We also demonstrate that by extending this approach to other dynamic systems, with specific hardware implementations and failure modes, we can systematically obtain fault-tolerant architectures. More specifically, we apply these techniques to linear time-invariant dynamic systems and Petri net models of discrete event systems. © 2002 The Franklin Institute. Published by Elsevier Science Ltd. All rights reserved.
Showing items related by title, author, creator and subject.
Louca, Soulla P.; Neophytou, Neophytos; Lachanas, Adrianos; Evripidou, Paraskevas (2000)In this paper, we propose the design and development of a fault tolerant and recovery scheme for the Message Passing Interface (MPI). The proposed scheme consists of a detection mechanism for detecting process failures, ...
Hadjicostis, Christoforos N.; Verghese, G. C. (2000)We use unreliable system replicas and unreliable voters to construct redundant dynamic systems that tolerate transient failures in their state transition and error correcting mechanisms. Using low density parity check ...
Hadjicostis, Christoforos N. (2003)This paper develops resource-efficient alternatives to modular redundancy for fault-tolerant discrete-time (DT) linear time-invariant (LTI) dynamic systems. The proposed method extends previous approaches that are based ...