Totem: a fault-tolerant multicast group communication system

L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, R. K. Budhia, C. A. Lingley-Papadopoulos
1996 Communications of the ACM  
When Totem delivers multicast messages, it invokes operations in the same total order throughout the distributed system. The result: consistency of replicated data and simpli ed programming of applications. the sidebar, Why T otem?" Total ordering of messages simpli es the programming of fault-tolerant distributed applications. If distributed operations are derived from the same messages in the same total order, consistency of replicated information is easier to maintain. Simpli ed programming
more » ... esults in fewer programming errors and increased reliability for the application. Totem is intended for complex applications in which both fault tolerance and realtime performance are critical. Such complex applications are typically built as asynchronous event-driven distributed systems. The types of applications that can bene t from Totem's totally ordered message delivery service include many of the computer systems that are most important to our society, for example, air tra c control, industrial automation, transaction processing, banking, stock market trading, intelligent highway, medical monitoring and replicated database systems. The characteristics that make T otem suitable for complex applications, particularly soft real-time applications, include High throughput and low predictable latency Rapid detection of, and recovery from, faults System-wide total ordering of messages, even for systems in which the network can partition and remerge, and for systems in which process groups can intersect Scalability to larger systems based on multiple LANs, interconnected by gateways, within the same geographical area. With Totem, correctness of message ordering and con guration changes is ensured, even in the presence of multiple faults. Yet, excellent performance is achieved. Totem Services The Totem system provides reliable totally ordered multicasting of messages to processes within process groups over a single LAN or over multiple LANs interconnected by gateways. Totem provides this delivery service in the presence of various types of communication and processor faults, including message loss, network partitioning, and processor crash, omission and timing faults, but not completely arbitrary faults. The structure of the Totem system as a hierarchy of protocols is shown in Figure 1 . With reference to this hierarchy, w e s a y that a message is received from the next lower layer of the hierarchy and is delivered to the next higher layer. When messages are
doi:10.1145/227210.227226 fatcat:usv5gp45pzd6figz5ad74p5ayu