The JEDI event-based infrastructure and its application to the development of the OPSS WFMS

G. Cugola, E. Di Nitto, A. Fuggetta
2001 IEEE Transactions on Software Engineering  
The development of complex distributed systems demands for the creation of suitable architectural styles (or paradigms) and related run-time infrastructures. An emerging style that is receiving increasing attention is based on the notion of event. In an event-based architecture, distributed software components interact by generating and consuming events. An event is the occurrence of some state change in a component of a software system, made visible to the external world. The occurrence of an
more » ... vent in a component is asynchronously notified to any other component that has declared some interest in it. This paradigm (usually called "publish/subscribe" from the names of the two basic operations that regulate the communication) holds the promise of supporting a flexible and effective interaction among highly reconfigurable, distributed software components. In the past two years, we have developed an object-oriented infrastructure called JEDI (Java Event-based Distributed Infrastructure). JEDI supports the development and operation of event-based systems and has been used to implement a significant example of distributed system, namely, the OPSS workflow management system (WFMS). The paper illustrates JEDI main features and how we have used them to implement OPSS. Moreover, the paper provides an initial evaluation of our experiences in using the event-based architectural style and a classification of some of the event-based infrastructures presented in the literature. Keywords Event-based systems, distributed systems, software architectures, workflow, business processes, object-orientation, publish/subscribe middleware. Convergence between telecommunication, broadcasting, and computing is opening new opportunities and challenges for a potentially large market of innovative network-wide services. The class of users interested by this revolution is significantly large: families, professionals, large organizations, government agencies, and administrations. The services range from home banking and electronic commerce, to coordination and workflow support for large dispersed teams, within the same company or even across multiple companies. Many research and industrial activities are currently being carried out to identify feasible strategies to develop and operate these services in an effective and economically viable way. The requirements and technical problems that have to be addressed are complex and critical: • Services must be able to operate on a wide area network with acceptable performance. • The software technology used to implement these services must be "light", i.e., it should be scalable in terms of the number of both components and users involved and of their distribution. • The technology must enable a "plug and play" approach to support dynamic reconfiguration and introduction of new service components. • Finally, it is essential to support openness and interoperability between different platforms since the services are usually implemented in a heterogeneous hardware infrastructure. To foster the diffusion of network-wide applications we need to identify proper architectural styles and supporting infrastructures able to cope with the above requirements and challenges. Actually, there is a wide range of distributed architectural styles and middleware infrastructures that have purposely been conceived to address the above issues. Most of these existing styles and infrastructures are based on a point-to-point communication model. For instance, the basic service offered by CORBA [36], RMI [51], and DCOM [20] is the synchronous invocation of a remote service offered by some server over the network. The wide diffusion of the point-to-point communication model has been fostered by the availability of RPC, which is certainly an effective mechanism to implement a wide range of distributed systems. RPC is characterized by a tight conceptual coupling between the component that requests a service (i.e., the client) and the component that satisfies such request (i.e., the server). Before invoking a service, the client has to know the existence of a server capable of satisfying its request and it has to obtain a reference to such server. Even extensions and new facilities of advanced middleware infrastructures such as CORBA Naming Service [37] and CORBA Dynamic Invocation Interface do not depart significantly from the underlying RPC para-3 digm. Despite the effectiveness and conceptually simplicity of the point-to-point communication model, many situations require the availability of a more decoupled model. In particular, the communication among the components of a distributed system may involve more than two parties, and may be driven by the contents of the information being exchanged rather than by the identity of information producers and consumers. As an example, let us consider a network management system. In this system, whenever a network node signals a failure, a procedure has to be started to fix the failure. By using an event-base approach the node is simply required to notify the "external world" of the detected failure and can therefore ignore how the failure will be handled. The "external world" might be constituted by a single application placed at a fixed location on the net and in charge of executing the complete recovery procedure. Alternatively, it can be composed of different applications dynamically dispersed across the network and in charge of different steps of the recovery procedure (e.g., logging the failure, reconfiguring a subsystem, etc.). As another example, consider a distributed workflow management system, where, as soon as an activity A terminates, other activities A1,...,An have to be launched. In this case, it is useful to have a mechanism that hides the existence of A1,...,An, to A, and allows A to simply notify the "external world" of its termination. The effect of this notification is hidden to A, thus increasing information hiding and reducing the coupling among components. The two scenarios presented above are not unique as for their communication requirements. In [4] other scenarios that will likely emerge in the next future are presented. A promising approach to address the above issue is the event-based paradigm. The components of an event-based system cooperate by sending and receiving events, a particular form of messages. The sender delivers an event to an event dispatcher. The event dispatcher is in charge of distributing the event to all the components that have declared their interest in receiving it. Thus, the event dispatcher supports a high degree of decoupling between the sources and the recipients of an event. The relevance and potential impact of the event-based paradigm has been acknowledged by OMG that has recently defined an event service on top of the CORBA framework (see Section 5). Nevertheless, we are still far from a satisfactory solution able to address in a coherent and comprehensive way all the issues and problems related to the creation of an effective, networkwide event distribution infrastructure [45]. This observation can be easily verified by checking the large number of initiatives being launched in the area. Several new draft proposals have been submitted to the IETF (Internet Engineering Task Force). Furthermore, the event-based paradigm has been the focus of the first workshop of the series TWIST (The Workshop on Inter-4 net-scale Software Technologies) [60]. The workshop has gathered together researchers from leading software industries and from the academia to compare existing approaches and steer future research work on the topic. As a contribution to the ongoing research work, we have developed an event-based, object-oriented infrastructure called JEDI (Java Event-based Distributed Infrastructure) that has been applied, among the others, to the development of a WorkFlow Management System (WFMS) called OPSS (ORCHESTRA Process Support System). 1 A WFMS [3, 23] is an environment for developing and executing a process-based application, i.e., a coordinated set of activities involving both humans and computerized tools. Typical examples of the activities supported by a WFMS are business services, such as customer care, interoffice procedures, and software development processes. This paper presents JEDI and OPSS, by highlighting their main features and functionality. It also illustrates some lessons we have derived from the development and operation of JEDI. This paper significantly extends a previously published paper [15], by providing more details on the design choices that guided the development of both JEDI and OPSS, and by introducing new features that were not presented in the previous paper. It also significantly enriches the analysis of the state of the art, and the comparison and evaluation of the related work. The contributions of this paper can be summarized as follows: • It describes JEDI, an event-based infrastructure suitable to develop a wide range of distributed systems. • It introduces OPSS and discusses the OPSS features that mostly benefit from the adoption of an event-based communication infrastructure. • It presents our experiences in using the event-based paradigm and provides a comprehensive comparison of our work with the state of the art in the field. Consequently, the paper is organized as follows. Section 2 presents JEDI basic concepts and implementation. Section 3 provides an overview of the architecture of OPSS. Section 4 provides an evaluation of our experience. Section 5 presents the related works. Finally, Section 6 draws some conclusions and proposes future research activities. 1 OPSS has been developed as part of the ORCHESTRA project [34]. Reactive objects An active object can invoke the basic operations offered by JEDI (e.g., event generation and subscription) in any order. According to our experience, however, some active objects often operate according to a quite standard sequence of operations. Upon activation, an AO subscribes to some events and then starts waiting for their occurrence. When one of these events is notified, the AO performs some operations (possibly generating new events and subscribing or unsubscribing to events) and then starts waiting again. It therefore executes a standard loop: wait for any event among those it has subscribed to, and then process it. We use the term reactive object to indicate this particular kind of active object. The JEDI framework provides programmers with standard classes supporting the implementation of both active and reactive objects (see Section 2.4). The JEDI class used to implement reactive objects (i.e., the ReactiveObject class) exports an abstract method (called processMessage) that is automatically invoked each time the reactive object has to be notified of an event it has subscribed to. Programmers who want to implement a reactive object should subclass the ReactiveObject class and implement the processMessage method. Distribution of the Event Dispatcher The event dispatcher is a logically centralized component since it must have a global knowledge of all the events that are generated and of all the subscription requests that are issued. However, a centralized implementation of the event dispatcher can become a critical bottleneck for a distributed system. This happens, in particular, when the system is composed of several Internet-wide distributed AOs that are engaged in an intense communication. In this situation, it is worthwhile to decompose the event dispatcher in several distributed and cooperating components, in order to guarantee an acceptable level of performance. This decomposition, however, requires some coordination protocol to be defined among the event dispatcher components. They, in fact, need to share information about generated events and subscriptions in order to guarantee that agents connected to different event dispatcher components communicate properly. Such coordination protocol has to be carefully designed in order to limit the network load generated by the intra-dispatcher coordination activity. In some cases, in fact, it could happen that this coordination traffic grows more than the traffic generated by AOs, thus resulting in undesired and unacceptable performance degradation. In JEDI we provide two implementations of the event dispatcher: centralized and distributed. The centralized version is constituted by a single (operating system) process and has been developed to address the requirements of simple systems, com-JavaBeans
doi:10.1109/32.950318 fatcat:p7nhr2743jhjbj3wsaxtk3ihm4