A Path to Operating System and Runtime Support for Extreme Scale Tools
This report describes progress made over the course of funding for this project, including work at the University of Wisconsin, Oak Ridge National Laboratory and Rogue Wave Software. Work at the University of Wisconsin Tools and middleware are crucial to the effective use of large distributed systems. Middleware enables efficient utilization of resources, and tools help to diagnose and fix problems in distributed programs. A common requirement among tools and middleware is operating on groups
... erating on groups of distributed processes and files, but prior work has failed to provide a solution that both addresses the key scalability barriers and is easy to use within new and existing software. Each group operation involves a single process communicating with distributed hosts to apply the operation to each member, and collecting status results or data produced by the operations. Often, group operation status or data results are further processed to derive information that summarizes group behavior or to guide further operations on the group. In many tools and middleware, both the distributed operations and data analysis need to be completed in a time span suitable for providing interactive functionality. For large distributed groups, however, the distributed communication and data processing required represent critical scalability barriers. In this project, we cast distributed resource access as operations on files in a global name space and developed a common, scalable solution for group operations on distributed processes and files. The resulting solution enables tool and middleware developers to quickly create new scalable software or easily improve the scalability of existing software.