Proposal of Grid Monitoring System with Fault Tolerance

Abu Elenin Sherihan, Masato Kitakami
2012 Journal of Information Processing  
A Grid monitoring system is differentiated from a general monitoring system in that it must be scalable across wide-area networks, include a large number of heterogeneous resources, and be integrated with the other Grid middleware in terms of naming and security issues. A Grid Monitoring is the act of collecting information concerning the characteristics and status of resources of interest. The Grid Monitoring Architecture (GMA) specification sets out the requirements and constraints of any
more » ... ementation. It is based on simple Consumer/Producer architecture with an integrated system registry and distinguishes transmission of monitoring data and data discovery logically. There are many systems that implement GMA but all have some drawbacks such as, difficult installation, single point of failure, or loss of message control. So we design a simple model after we analyze the requirements of Grid monitoring and information service. We propose a grid monitoring system based on GMA. The proposed Grid monitoring system consists of producers, registry, consumers, and failover registry. The registry is used to match the consumer with one or more producers, so it is the main monitoring tool. The failover registry is used to recover any failure in the main registry. The structure of a proposed grid monitoring system depends on java Servlet and SQL query language. This makes the system more flexible and scalable. We try to solve some problems of the previous works in a Grid monitoring system such as, lack of data flow and single point of failure in R-GMA, and difficulty of installing in MDS4. Firstly, we solve the problem of single point of failure by adding failover registry to the system. It can recover any failure in Registry node. Secondly, we take into consideration the system components to be easy to install/maintain. The proposed system is combination of few systems and frequency of update is low. Thirdly, load balancing should be added to the system to overcome the message overloaded. We evaluate the performance of the system by measuring the response time, utilization, and throughput. The result with load balancing is better than that without load balancing in all evaluation results. Finally, we make a comparison between the proposed system and the other three monitoring systems. We also make a comparison between the four types of load balancing algorithms.
doi:10.2197/ipsjjip.20.366 fatcat:wyykzjtpcfek3crbqhcv5pe7n4