78 Hits in 6.1 sec

Yield and performance enhancement through redundancy in VLSI and WSI multiprocessor systems

I. Koren, D.K. Pradhan
1986 Proceedings of the IEEE  
Therefore, we first describe the underlying concepts of fault tolerance at work in these multiprocessor systems.  ...  Concerns about fault tolerance in VLSI-based systems stem from the two key factors of reliability and yield enhancements. Low yield is a problem of increasing significance as circuit density grows.  ...  First, the remaining redundant elements (if any) can be used as spares and then, the system is gracefully degraded.  ... 
doi:10.1109/proc.1986.13532 fatcat:bycl775bmfb27i45ppqy4wykgy

Commercial fault tolerance: a tale of two systems

W. Bartlett, L. Spainhower
2004 IEEE Transactions on Dependable and Secure Computing  
Both systems have a long history; the initial IBM S/360 machines were shipped in 1964, and the Tandem NonStop System was first shipped in 1976.  ...  coupled multiprocessor design that supports a "fail-fast" philosophy implemented through a combination of hardware and software, with workload being actively taken over by another resource when one fails  ...  Fault tolerant enhancements were added with each new generation of CMOS, and today's zSeries has the most extensive detection and recovery in the mainframe history, as well as mean time to hardware-caused  ... 
doi:10.1109/tdsc.2004.4 fatcat:g5ht4nlexrdjtmpka2tjcai6fa


1991 International Conference on Aerospace Sciences and Aviation Technology  
In this research, the necessary and sufficient conditions which are required to identify the faulty parts using different diagnostic models, whether these faults in the system units (main units, which  ...  Most of these works have been concentrated on the identification of faulty units only in digital systems.  ...  Diagnosis and fault identification A t-diagnosable system with faulty comparators can be explained as follows.  ... 
doi:10.21608/asat.1991.25831 fatcat:ifnvsoizgfggzlo6rzjy2ybgvi

Fault tolerance on-chip: a reliable computing paradigm using self-test, self-diagnosis, and self-repair (3S) approach

Xiaowei Li, Guihai Yan, Jing Ye, Ying Wang
2018 Science China Information Sciences  
The "magic cure" is the Fault Tolerance On-Chip (FTOC) mechanism, which relies on a suite of built-in design-for-reliability logic, including fault detection, fault diagnosis, and error recovery, working  ...  Fault tolerance on-chip: a reliable computing paradigm using self-test, self-diagnosis, and self-repair (3S) approach.  ...  FTOC: a 3S approach As a fault tolerance mechanism, FTOC has the ingredients of generic fault tolerance mechanisms: fault detection, fault diagnosis, and error recovery.  ... 
doi:10.1007/s11432-017-9290-4 fatcat:3mwg4l5pyrashe6het5rlsnlue

Self-Adapting Resource Escalation for Resilient Signal Processing Architectures

Naveed Imran, Ronald F. DeMara, Jooheung Lee, Jian Huang
2013 Journal of Signal Processing Systems  
To deal with susceptibility to aging and process variation in the deep submicron era, signal processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of  ...  Its performance is compared to two alternative reconfiguration algorithms which prioritize the optimization of the number of reconfiguration occurrences and the fault detection latency, respectively.  ...  Fault-handling (FH) systems typically employ a sequence of resolution phases including Fault Detection, Fault-Diagnosis, and Fault Recovery.  ... 
doi:10.1007/s11265-013-0811-x fatcat:3t5yp7uqzzeulauarayjwddm6e

Fault Tolerance in Tandem Computer Systems [chapter]

Joel Bartlett, Jim Gray, Bob Horst
1987 The Evolution of Fault-Tolerant Computing  
It is designed for online diagnosis and maintenance. A range of CPUs may be inter-connected via a hierarchical fault-tolerant local network.  ...  An application generator allows users to develop fault-tolerant applications as though the system were a conventional computer.  ...  One category of compound faults is the combination of a hardware failure and a human fault during the consequent human activity of diagnosis and repair.  ... 
doi:10.1007/978-3-7091-8871-2_3 fatcat:j3q2d66tnfgazgqs2jjfpjyhtu

From defects to failures: a view of dependable computing

Behrooz Parhami
1988 SIGARCH Computer Architecture News  
Certain system states expose the defect, resulting in the development of a logic-level fault. Information flowing within a faulty system may become contaminated, leading to the presence of an error.  ...  An erroneous system state may result in a subsystem malfunction.  ...  ACKNOWLEDGEMENTS The work reported here was initiated when the author was a Visiting Professor at the Department of Computer Science, University of Waterloo, Canada, where it was supported in part by the  ... 
doi:10.1145/54331.54345 fatcat:weqgkdnzozfkpbumffajipfvsi

On effective and efficient in-field TSV repair for stacked 3D ICs

Li Jiang, Fangming Ye, Qiang Xu, Krishnendu Chakrabarty, Bill Eklow
2013 Proceedings of the 50th Annual Design Automation Conference on - DAC '13  
We describe a reconfigurable in-field repair solution that is able to effectively tolerate latent TSV defects through the judicious use of spares.  ...  The proposed solution includes a reconfigurable repair architecture that enables spare TSV sharing between TSV grids, and the corresponding in-field repair algorithms.  ...  To achieve these objectives, as in [22] , we assume the existence of a processor core and non-volatile memory in the system for test and diagnosis purpose (see the conceptual architecture shown in Fig  ... 
doi:10.1145/2463209.2488824 dblp:conf/dac/JiangYXCE13 fatcat:4sltib2scrfexbfwvfyqahfo4e

Zero-maintenance of electronic systems: Perspectives, challenges, and opportunities

Richard McWilliam, Samir Khan, Michael Farnsworth, Colin Bell
2018 Microelectronics and reliability  
Efforts are concentrated on built-in detection, masking and active mitigation that comprises self-recovery or self-repair capability, and has a focus on system resilience and recovering from fault events  ...  Design techniques are critically reviewed to clarify the role of fault coverage, resource allocation and fault awareness, set in the context of existing and emerging printable/nanoscale manufacturing processes  ...  by solutions observed in biological systems that construct resilient systems out of many spare components.  ... 
doi:10.1016/j.microrel.2018.04.001 fatcat:bjr3ynhgs5eirdu3yitgnfh3mm

Methods for fault tolerance in networks-on-chip

Martin Radetzki, Chaochao Feng, Xueqian Zhao, Axel Jantsch
2013 ACM Computing Surveys  
The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years.  ...  Networks-on-Chip constitute the interconnection architecture of future, massively parallel multiprocessors that assemble hundreds to thousands of processing cores on a single chip.  ...  Fault diagnosis and fault tolerance may be employed at different stages in the life of a system. After production, integrated circuits are tested offline, with external test equipment.  ... 
doi:10.1145/2522968.2522976 fatcat:3t4b3rhbgbc2bphjevpkzlpm6u

Reliability, Availability, and Serviceability of IBM Computer Systems: A Quarter Century of Progress

M. Y. Hsiao, W. C. Carter, J. W. Thomas, W. R. Stringfellow
1981 IBM Journal of Research and Development  
The central issue in designing systems with good RAS characteristics is recovery-reduction of fault occurrence, detection and counteraction of errors [2], and efficient repair procedures.  ...  Recovery implies resumption of operation with data integrity. Figure 1 illustrates the basic relationship between faults and system RAS for a unified hardware/system of U S .  ...  Byman for his support of this paper, as well as P. E. Barshinger and R. C. Williams for their assistance, and G. R. Santana for his comments on the disk section.  ... 
doi:10.1147/rd.255.0453 fatcat:5cysi4lynvaj5ajw2ihmhl7hv4

Transaction-Based Online Debug for NoC-Based Multiprocessor SoCs

Mehdi Dehbashi, Gorschwin Fey
2014 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing  
As complexity and size of Systems-on-Chip (SoC) grow, debugging becomes a bottleneck for designing IC products.  ...  In this paper, we present an approach for online debug of NoC-based multiprocessor SoCs. Our approach utilizes monitors and filters implemented in hardware.  ...  Acknowledgements Mehdi Dehbashi did this work as part of his PhD in the University of Bremen.  ... 
doi:10.1109/pdp.2014.72 dblp:conf/pdp/DehbashiF14 fatcat:dc7w732zszh63ogm62boupahf4

Transaction-based online debug for NoC-based multiprocessor SoCs

Mehdi Dehbashi, Görschwin Fey
2015 Microprocessors and microsystems  
As complexity and size of Systems-on-Chip (SoC) grow, debugging becomes a bottleneck for designing IC products.  ...  In this paper, we present an approach for online debug of NoC-based multiprocessor SoCs. Our approach utilizes monitors and filters implemented in hardware.  ...  Acknowledgements Mehdi Dehbashi did this work as part of his PhD in the University of Bremen.  ... 
doi:10.1016/j.micpro.2015.03.003 fatcat:kztdnq7zafbw5eaoaqcgizwgrq

A Unified Approach for the Synthesis of Scalable and Testable Embedded Architectures [chapter]

Prashanth B. Bhat, Chouki Aktouf, Viktor K. Prasanna, Sandeep Gupta, Melvin A. Breuer
1998 Fault-Tolerant Parallel and Distributed Systems  
This paper presents a new synthesis approach for reliable high performance embedded systems. It considers requirements of both scalability and testability in an integrated manner.  ...  Scalable and Testable Systems Definition 1: We define u-containable tests as those tests for which the asymptotic complexity of the test computation and communication does not exceed the complexity of  ...  Comparison testing in multiprocessor systems has been extensively investigated in the literature for fault detection and diagnosis [5, 7, 13] .  ... 
doi:10.1007/978-1-4615-5449-3_12 fatcat:rawiqhhlojfo3b2r66snd3fqbm

Modeling and Analysis of Fault Detection and Fault Tolerance in Wireless Sensor Networks

Arslan Munir, Joseph Antoon, Ann Gordon-Ross
2015 ACM Transactions on Embedded Computing Systems  
Results obtained from our FT modeling reveal that an FT WSN composed of duplex sensor nodes can result in as high as a 100% MTTF increase and approximately a 350% improvement in reliability over a Non-Fault-Tolerant  ...  Technological advancements in communications and embedded systems have led to the proliferation of Wireless Sensor Networks (WSNs) in a wide variety of application domains.  ...  Taxonomy for Fault Diagnosis Techniques A fault diagnosis system is a monitoring system that detects faulty sensor nodes and their location in the WSN.  ... 
doi:10.1145/2680538 fatcat:akwz77zob5gpng3aedv6fjuxmu
« Previous Showing results 1 — 15 out of 78 results