Examining the Stability of Logging Statements
2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)
Logging statements produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior research demonstrated the importance of logging statements in operating, understanding and improving software systems. The importance of logs has lead to a new market of log management and processing tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing misleading
... ausing misleading results and failures of log processing tools. In order to proactively mitigate such issues that are caused by unstable logging statements, in this paper we empirically study the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20-45% of the logging statements in our studied applications change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which they will let their tools depend. In this paper, we make an important first step towards assisting developers of log processing tools in determining whether a logging statement is likely to remain unchanged in the future. Using random forest classifiers, we examine which metrics are important for understanding whether a logging statement will change. We show that our classifiers achieve 83%-91% precision and 65%-85% recall in the four studied applications. We find that file ownership, developer experience, log density and SLOC are important metrics for determining whether a logging statement will change in the future. Developers can use this knowledge to build more robust log processing tools, by making those tools depend on logs that are generated by logging statements that are likely to remain unchanged.