Analyzing blocking to debug performance problems on multi-core systems

Pierre-Marc Fournier, Michel R. Dagenais
2010 ACM SIGOPS Operating Systems Review  
Multi-core systems are rapidly becoming more prevalent. Consequently, developers frequently face performance bugs caused by unexpected interactions between parallel software components. The location of these bugs is difficult to identify with current tools. Indeed, the process exhibiting the slowness may be separated from the root cause of the problem by a blocking chain involving several other processes. This article introduces a new approach for analyzing blocking on multi-core systems and
more » ... orts on its implementation in the LTTV Delay Analyzer. It enables developers to quickly understand the dependencies among processes and see how the total elapsed time is divided into its main components. The LTTV Delay Analyzer was used to analyze and rapidly correct complex performance problems, something not possible with the existing tools. The Linux Trace Toolkit, LTTng, is used for most of the instrumentation and the trace recording, allowing the tracing of production systems with great accuracy and minimal impact. This approach uses solely kernel instrumentation and does not require the instrumentation or recompilation of processes. The analysis time is linear with respect to trace size.
doi:10.1145/1773912.1773932 fatcat:eynwoqxqrzgclakbfpptqajy3m