An exploratory study of the evolution of communicated information about the execution of large software systems

Weiyi Shang, Zhen Ming Jiang, Bram Adams, Ahmed E. Hassan, Michael W. Godfrey, Mohamed Nasser, Parminder Flora
2013 Journal of Software: Evolution and Process  
Substantial research in software engineering focuses on understanding the dynamic nature of software systems in order to improve software maintenance and program comprehension. This research typically makes use of automated instrumentation and profiling techniques after the fact, that is, without considering domain knowledge. In this paper, we examine another source of dynamic information that is generated from statements that have been inserted into the code base during development to draw the
more » ... system administrators' attention to important run-time phenomena. We call this source communicated information (CI). Examples of CI include execution logs and system events. The availability of CI has sparked the development of an ecosystem of Log Processing Apps (LPAs) that surround the software system under analysis to monitor and document various run-time constraints. The dependence of LPAs on the timeliness, accuracy and granularity of the CI means that it is important to understand the nature of CI and how it evolves over time, both qualitatively and quantitatively. Yet, to our knowledge, little empirical analysis has been performed on CI and its evolution. In a case study on two large open source and one industrial software systems, we explore the evolution of CI by mining the execution logs of these systems and the logging statements in the source code. Our study illustrates the need for better traceability between CI and the LPAs that analyze the CI. In particular, we find that the CI changes at a high rate across versions, which could lead to fragile LPAs. We found that up to 70% of these changes could have been avoided and the impact of 15% to 80% of the changes can be controlled through the use of robust analysis techniques by LPAs. We also found that LPAs that track implementation-level CI (e.g. performance analysis) and the LPAs that monitor error messages (system health monitoring) are more fragile than LPAs that track domain-level CI (e.g. workload modelling), because the latter CI tends to be long-lived. In practice, system administrators and developers typically rely on the software system's communicated information (CI), consisting of the major system activities (e.g. events) and their associated contexts (e.g. a time stamp) to understand the high-level field behaviour of large systems and to diagnose and repair bugs. Rather than generating tracing information in a blind way, developers choose to explicitly communicate some information that is considered to be particularly important for system operation. Therefore, we call such information as CI because it is the information communicated automatically by the system during its execution, whereas traces are generated by the people who analyze the system after the fact. One common medium for CI are execution logs. The purpose and importance of the information in such logs varies on the basis of their purpose. For example, detailed debugging logs are relevant to developers (i.e. implementation CI), whereas operation logs summarizing the key execution steps are more relevant to operators (i.e. domain-level CI). The rich nature of CI has introduced a whole new market of applications that complement large software systems. We collectively call these applications the log processing apps (LPAs). The apps are used, for example, to generate workload information for capacity planning of large-scale systems [2, 3], to monitor system health [4], to detect abnormal system behaviours [5] or to flag performance degradations [6] . As such, these LPAs play an important role in the development and management of large software systems, to the extent that major decisions such as adding server capacity or changing the company's business strategy can depend on the information derived by the LPAs. Communicated information changes often break the functionality of the LPAs. Often, LPAs are inhouse applications that are highly dependent on the CI. Although they are typically built on commercial platforms by IBM [7] and Splunk [8], the actual link between the LPAs and the monitored system depends heavily on by the specific kind and format of CI in use. Hence, the apps require continuous maintenance as the format or type of CI changes and as the needs change. However, because little is known about the evolution of CI, it is unclear how much maintenance effort LPAs require in the long run. In our previous research [9], we explored the evolution of CI by examining the logs of 10 releases of an open source software system (Hadoop) and 9 releases of a closed source large enterprise application, which we call EA. Because the CI during typical execution of a system may not cover all the CI that the system is capable of, this paper extends our research by performing a lower-level study than the previous research. We study the CI on the basis of the logging statements in the source code. We call such logging statements CI potential because their output potentially will show up in the CI during execution if the code that they are in is executed. We perform CI potential case studies on the 10 studied releases of Hadoop and five releases of another open source software system (PostgreSQL). Our study is the first step in understanding the maintenance effort for LPAs by studying the evolution of the CI (i.e. their input domain). Our study tracks the CI for the execution of a fixed set of major features across the lifetime of the three studied systems and analyzes the source code of the two studied open source systems for CI potential. This allows us to address the following research questions: RQ1: How much does CI change over time?We find that over time, the amount of CI communicated about our covered features at execution level increases 1.5-2.8 times compared with the first-studied release. The CI potential (logging statements) increases 0.17-2.45 times. We note that down to only 40% of the CI at execution level stays the same across releases and up to 21.5% of the CI at execution level is modified with context modifications across releases. The modifications to the CI may be troublesome as they cause the LPAs to be more error prone. RQ2: What types of modifications happen to CI?Examining the CI and CI potential modifications across releases, we identify eight types of modifications. Of these changes, 10-70% can be avoided, and the impact of 15-80% of them can be minimized through the use of robust analysis techniques. The remaining modifications are risky and should be tracked carefully to avoid breaking LPAs. RQ3: What information is conveyed by short-lived CI?We find that short-lived CI at code level focuses more on system errors and warnings. Besides the high-level conceptual information, the content WEIYI SHANG ET AL.
doi:10.1002/smr.1579 fatcat:klrlvo7oh5gmjbcgdw7yiu5crq