Finding concurrency bugs with context-aware communication graphs

Brandon Lucia, Luis Ceze
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
Incorrect thread synchronization often leads to concurrency bugs that manifest nondeterministically and are difficult to detect and fix. Past work on detecting concurrency bugs has addressed the general problem in an ad-hoc fashion, focusing mostly on data races and atomicity violations. Using graphs to represent a multithreaded program execution is very natural, nodes represent static instructions and edges represent communication via shared memory. In this paper we make the fundamental
more » ... tion that such basic context-oblivious graphs do not encode enough information to enable accurate bug detection. We propose context-aware communication graphs, a new kind of communication graph that encodes global ordering information by embedding communication contexts. We then build Bugaboo, a simple and generic framework that accurately detects complex concurrency bugs. Our framework collects communication graphs from multiple executions and uses invariant-based techniques to detect anomalies in the graphs. We built two versions of Bugaboo: BB-SW, which is fully implemented in software but suffers from significant slowdowns; and BB-HW, which relies on custom architecture support but has negligible performance degradation. BB-HW requires modest extensions to a commodity multicore processor and can be used in deployment settings. We evaluate both versions using applications such as MySQL, Apache, PARSEC, and several others. Our results show that Bugaboo identifies a wide variety of concurrency bugs, including challenging multivariable bugs, with few (often zero) unnecessary code inspections. violations, atomicity violations and ordering violations. Past work on concurrency error debugging typically covered each category separately. For example, RecPlay [17] detects data races and provides replay capabilities; AVIO [9] and Atom-Aid [10] detect atomicity violations using heuristics based on identifying unserializable interleavings; and Eraser [18] detects locking discipline violations using its lock-set algorithm. While these systems greatly help in identifying concurrency bugs, they address the general problem in a piecemeal way. Moreover, current tools do not adequately address less well-studied classes of bugs such as ordering violations and bugs involving multiple variables [8] . Communication graphs are a convenient representation of a multithreaded program execution. In a basic communication graph, nodes represent memory instructions and edges represent communication via shared-memory. Concurrency errors lead to abnormal inter-thread communication and may therefore manifest themselves as anomalies in communication graphs. Because multithreaded execution is nondeterministic and consequently bugs manifest intermittently, different executions lead to different graphs. By examining the differences between many graphs, it is possible to identify anomalous communication and consequently where bugs are likely to be. The biggest advantage of this approach is that it is general, as it does not rely on heuristics that are specific to a class of bugs. The key challenge, however, is building a communication graph in which enough relevant information is encoded. If insufficient information is encoded in the graph, bugs may not render as a graph anomalies. In this paper, we make the fundamental observation that basic communication graphs do not encode enough information to enable general concurrency bug detection. We address this problem by proposing context-aware communication graphs, a new kind of communication graph that uses communication context to encode access ordering information. Communication contexts are formed by capturing the sequence of all recent communication events observed by a thread. We then develop Bugaboo, a complete system that leverages these graphs to provide efficient and accurate bug detection, useful both in development and deployment situations. This paper makes the following contributions: (1) we propose context-aware communication graphs; (2) we propose two invariantbased approaches to processing communication graphs and accurately locating bugs in code: one fully automatic and one semiautomatic; (3) we describe BB-SW, a software-only implementation of Bugaboo and propose BB-HW, which is based on a set of architecture extensions to a commodity multicore system that brings performance overheads in collecting context-aware communication graphs to nearly zero; (4) we show how BB-HW can be used in pro-int length; // protected by lock L char *str; // protected by lock L ... lock(L); tptr = str; unlock (L); ... lock(L); tlen = length; unlock(L);
doi:10.1145/1669112.1669181 dblp:conf/micro/LuciaC09 fatcat:tcapnbfx45f4ratnjaerjjhsmm