Constructing reliable distributed communication systems with CORBA

S. Maffeis, D.C. Schmidt
1997 IEEE Communications Magazine  
Communication software and distributed services for nextgeneration applications must be reliable, efficient, flexible, and reusable. These requirements motivate the use of the Common Object Request Broker Architecture (CORBA). However, building highly available applications with CORBA is hard. Neither the CORBA standard nor conventional implementations of CORBA directly address complex problems related to distributed computing, such as real-time or high-speed quality of service, partial
more » ... , group communication, and causal ordering of events. This paper describes a CORBA-based framework that uses the Virtual Synchrony model to support reliable data-and processoriented distributed systems that communicate through synchronous methods and asynchronous messaging. the system. In addition, we require reliable applications to be highly available, i.e., the application can provide its essential services despite the failure of computing nodes, software objects, and communication links. Certain aspects of distributed systems make reliability more difficult to achieve. For instance, partial failures are an inherent problem in distributed systems. The "mean time to failure" (MTTF) of components in a distributed system rapidly decreases as the number of computing nodes and communication links that constitute the system increases. Another inherent problem is that developers must address complex execution states of concurrent programs. Distributed systems consist of processes that run in parallel on heterogeneous platforms and are therefore prone to race conditions, communication errors, node failures, and deadlocks. Thus, distributed systems are often more difficult to develop, administer, and maintain than their centralized counterparts. Conversely, other aspects of distributed systems can help make applications more robust. For instance, distributed systems can be made more reliable than centralized systems by providing important services redundantly on multiple nodes. To enable redundancy, active or passive replication should be supported by the communication system used to run distributed applications. Hence, we face a peculiar situation: although programming distributed applications is a daunting and error-prone task, a high degree of failure tolerance and reliability can be achieved if the underlying communication system software supports replication. The conclusion we draw is that non-robust communication software will lead to fragile distributed applications that will be frequently unavailable and will require constant supervision. In contrast, sophisticated communication software (such as the Virtual Synchrony approach presented in Section 4.3) can lead to distributed applications that are inherently more reliable, more modular, and more scalable than centralized ones.
doi:10.1109/35.565656 fatcat:wwy3adoxijfnxofeumuc6iicv4