Debugging mixed-environment programs with Blink

Byeongcheol Lee, Martin Hirzel, Robert Grimm, Kathryn S. McKinley
<span title="2014-06-11">2014</span> <i title="Wiley"> <a target="_blank" rel="noopener" href="" style="color: black;">Software, Practice &amp; Experience</a> </i> &nbsp;
Programmers build large-scale systems with multiple languages to leverage legacy code and languages best suited to their problems. For instance, the same program may use Java for ease of programming and C to interface with the operating system. These programs pose significant debugging challenges, because programmers need to understand and control code across languages, which often execute in different environments. Unfortunately, traditional multilingual debuggers require a single execution
more &raquo; ... ironment. This paper presents a novel composition approach to building portable mixed-environment debuggers, in which an intermediate agent interposes on language transitions, controlling and reusing single-environment debuggers. We implement debugger composition in Blink, a debugger for Java, C, and the Jeannie programming language. We show that Blink is (i) simple: it requires modest amounts of new code; (ii) portable: it supports multiple Java virtual machines, C compilers, operating systems, and component debuggers; and (iii) powerful: composition eases debugging, while supporting new mixed-language expression evaluation and Java native interface bug diagnostics. To demonstrate the generality of interposition, we build prototypes and demonstrate debugger language transitions with C for five of six other languages (Caml, Common Lisp, C#, Perl 5, Python, and Ruby) without modifications to their debuggers. Using real-world case studies, we show that diagnosing language interface errors require prior single-environment debuggers to restart execution multiple times, whereas Blink directly diagnoses them with one execution. Traditional debuggers are not much help with mixed-language programs because they are limited to a single execution environment. For example, native programs and their debuggers (e.g., the gdb debugger for C, C++, and Fortran) require language implementations to use the same application binary interface (ABI). The ABI is machine dependent and thus precludes portable execution environments for managed languages, such as Java, C#, JavaScript, and Python. For portability, managed languages rely on virtual machine (VM) execution, using interpretation, just-in-time compilation, and garbage collection. They abstract over internal code, the stack, and data representations. Debuggers for managed languages, such as the standard Java debugger jdb, operate on VM abstractions, for example, through the Java Debug Wire Protocol (JDWP), but do not understand native code. Current mixed-language debuggers are limited to XDI and dbx, which support Java and C within a single JVM [6, 7] , and the Visual Studio debugger, which supports managed and native code in the Common Language Runtime (CLR) [8] . While these debuggers understand all environments, they are behemoths that are generally not portable. The challenge when building a mixed-environment debugger is that each environment has different representations; managed debuggers operate at the level of bytecodes and objects, whereas native debuggers deal with machine instructions and memory words. This article presents a novel debugger composition design for building mixed-environment debuggers that uses runtime interposition to control and reuse existing single-environment debuggers. An intermediate agent instruments and controls all language transitions. We show composition with interposition is sufficient to implement the three pillars of debugging functionality: execution control, context management, and data inspection [9] . The result is a simple, portable, and powerful approach to building debuggers. We implement this approach in Blink, a debugger for Java, C, and the Jeannie programming language [10]. Because Blink reuses existing debuggers, it is simple: Blink requires 9K lines of new code, half of which implements interposition. Blink is portable: it supports multiple Java virtual machines or JVMs (Oracle and IBM), C compilers (GNU and Microsoft), and operating systems (Unix and Windows). By comparison, dbx works only with Oracle's JVM and XDI works only with the Harmony JVM. We also explore how well our composition approach generalizes to other languages and what are its requirements. We implemented a simple prototype of the language interposition approach for five of the six standard debuggers for Caml, Common Lisp, C#, Perl 5, Python, and Ruby. Our prototypes implement each language's FFI to C. Because the Caml debugger lacks the ability to evaluate functions on which our interposition approach depends, it requires changes to the existing debugger to compose. We use the function evaluation in the C# debugger to implement interposition. We simply interpose on the interpreters for Common Lisp, Perl 5, Python, and Ruby. These case studies indicate that debugger composition is viable in many language settings. Debugger composition furthermore facilitates powerful new debugging features: (i) a readeval-print loop (REPL) that, in Blink, evaluates mixed Java and C expressions in the context of a running program and (ii) a dynamic bug checker for two common JNI problems. We implement these features in Blink. This article demonstrates this functionality using several case studies, which reproduce bugs found in real programs and compare debugging with other tools to debugging with Blink. The other tools crash, silently ignore errors, or require multiple program invocations to diagnose a bug, whereas Blink typically identifies the bug directly in a single program invocation. The result is a debugger that helps users effectively find bugs in mixed-language programs. To summarize, the contributions of this work are as follows: 1. A new approach to building mixed-environment debuggers that composes single-environment debuggers. Prior debuggers either support only a single environment or re-implement functionality instead of reusing it. 2. Blink, an implementation of this approach for Java, C, and Jeannie, which is simple, portable, powerful, and open source [11] . 3. Two advanced new debugger features: a mixed-environment interpreter and a dynamic checker for detecting JNI misuse. Figure 1. Example bug: a typo in Java code (line 15) causes a crash in C code (line 31). This message shows the mixed C and Java stack and identifies the call at line 31 as erroneous. Because mid is invalid, the user would next determine that mid is derived from the string cstr and print cstr: Variable cstr holds "keyboardEvent" instead of "keyBoardEvent", but where does that value come from? Line 8, mentioned in the original stack trace, contains the expression EVENT_NAMES[idx]+"Event". To examine the Java array from the C breakpoint, the user employs Blink's mixed-language expression evaluation as follows: To fix the bug, the user would change either the string in EVENT_NAMES[1] or the method name in line 15. DEBUGGING MIXED-ENVIRONMENT PROGRAMS WITH BLINK DEBUGGER COMPOSITION APPROACH This section describes our approach to building mixed-environment debuggers by composing them out of single-environment debuggers. We use our implementation of Blink for Java and C as our running example. Our insight is that interposing a modest amount of functionality between language transitions suffices to reuse a substantial amount of functionality of component debuggers, creating one debugger that understands multilingual programs. Debugger features Our goal is to provide all the standard debugging features in a mixed environment. When a user debugs a program, he or she wants to find and correct a defect that results in erroneous data or control flow, which leads to erroneous output or a crash [15] . Rosenberg identifies three essential features in support of this quest [9]. Execution control: The debugger controls the execution of the debuggee process by starting it, halting it at breakpoints, single stepping through it, and eventually tearing it down. Typical interactive commands are run, break, step, continue, and exit. Context management: The debugger keeps track of where in the code the debuggee process is and, on demand, reports source code listings and call stack traces. Typical interactive commands are list and backtrace. Data inspection: Users query the debugger to inspect data with source language expressions, such as print or eval. Intermediate agent Our approach to implementing these standard debugger features for a mixed environment is to compose single-environment debuggers through an intermediate agent. Our mixed-environment debugger consists of a controller and one driver for each single-environment component debugger. Figure 2 illustrates this structure for the case of Java and C using jdb for Java, and gdb or cdb for C (depending on whether we run on Linux or Windows). The debuggee process runs both Java and C, and the intermediate agent coordinates the debuggers. The intermediate agent has two complementary responsibilities: Figure 2. Agent-based debugger composition approach. DEBUGGING MIXED-ENVIRONMENT PROGRAMS WITH BLINK chooses the overloaded method of receiving a Java reference if the Java expression has reference type instead of other overloaded methods receiving a value of primitive types. Convenience variables. Convenience variables store the results of a (sub)expression evaluation in temporary variables. Application variables are named locations in which application code stores data during execution. Convenience variables are additional named locations provided by the debugger, in which the user interactively stores data for later use in a debugger session. Convenience variables behave like variables in many scripting languages: they are implicitly created upon first use, have global scope, and are dynamically typed. In addition to user-defined convenience variables, some debuggers support internal convenience variables, for example, to hold intermediate results during expression evaluation. In the mixed-environment case, the debugger must remember not only the values of convenience variables but also their languages. Because gdb provides convenience variables (written '$var'), Blink reuses them to store C values. Because jdb and cdb lack this feature, Blink implements convenience variables in the debugger agent, using a table to map names to values and languages. Read-eval-print loop. This section explains how Blink evaluates expressions. Read. As suggested by Rosenberg [9], the Read stage of Blink's REPL reuses syntax analysis code. We reuse the Jeannie grammar, which composes Java and C grammars [10, 18] . It is written in Rats!, a parser generator that uses packrat parsing for expressiveness and performance. The Jeannie grammar and Rats! are designed for composition. Blink uses abstract syntax tree (AST) implementations from the xtc compiler framework, which is integrated with Rats! and provides generic tree walking support. Eval. The Eval stage of Blink's REPL evaluates the AST in two passes. Both passes use depthfirst left-to-right tree traversals. The first pass annotates each AST node with its language (Java or C). Figure 4 shows how Blink annotates the AST for the expression 'x = $y + 'z', assuming that the current language is Java. The second pass does the actual evaluation. The evaluation pass uses the backtick and print commands discussed earlier for language transitions and for the root of the AST, respectively. That leaves only AST nodes for operators in the languages being debugged. Rather than eagerly evaluating such nodes one by one, the evaluation pass builds up expression strings corresponding to maximum single-language subtrees. Evaluation of those expression strings is delayed as far as possible and is only forced at AST nodes for backtick or print. Figure 5 illustrates this. For example, the evaluator delimits single-language subtrees in Java and C at backtick and creates a convenience variable _vj as a representative of the subtree below backtick. Rather than eagerly computing the result of $y + _vj at the + node, the evaluator merely computes an expression string at Figure 4. Reading and annotating the expression x = $y + 'z when the current language is Java. DEBUGGING MIXED-ENVIRONMENT PROGRAMS WITH BLINK and connects all the processes, but before the user program commences, Blink gives the user a command prompt. When the program terminates, Blink tears down jdb and gdb/cdb and exits. Breakpoints. Breakpoints answer the question: 'How do I get to a point in program execution?' Users set breakpoints to inspect program states at points they suspect may be erroneous. The debugger's job is to detect when the breakpoint is reached and then transfer control to the user. One of the key challenges for a mixed-environment debugger is setting a breakpoint for a location in an inactive environment. This functionality requires the debugger to transfer control to the other environment's debugger, set the breakpoint, and return control to the current environment's debugger. Blink takes the breakpoint request from the user and checks if the request is for Java or C. If the current environment does not match the breakpoint environment, Blink switches the debugging context to the target environment and directs the breakpoint request to the corresponding debugger. Single stepping. Once the application reaches a breakpoint, the question is 'What happens next?' Users want to single step through the program, examining control flow and data values to find errors. The step into, or simply step, command executes the next dynamic source line, which may be the first line of a method call, whereas the step over, or next, command treats method calls as a single step. The challenge for mixed-environment single stepping is that while jdb can step through Java and gdb or cdb can step through C, they lose control when stepping into a call to the other environment or when returning to a caller from the other environment. Blink maintains control during a step command as follows. It sets internal breakpoints at all possible language transitions, so if the current component debugger loses control in a single step, then the other component debugger immediately gains control. Blink only enables transition breakpoints from the current environment to the other environment when the user requests a single step. Furthermore, when the user requests step over as opposed to step into, Blink enables return breakpoints, as opposed to both call and return breakpoints. Note that Blink does not make any attempts to decode the current instruction, but rather aggressively sets needed internal breakpoints just in case the single step causes an environment transition, and then operates on the user command. This approach greatly decreases debugger development effort, because accurate Java single stepping requires interpreting the semantics of all bytecodes and accurate C single stepping requires platform-dependent disassembly. Blink therefore relies on the component debuggers for this functionality. Once Blink sets the internal breakpoints, it implements single stepping by issuing the corresponding command to jdb or gdb/cdb. There are three possible outcomes.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.1002/spe.2276</a> <a target="_blank" rel="external noopener" href="">fatcat:hifa37vztrhq3nzund33msvvkm</a> </span>
<a target="_blank" rel="noopener" href="" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href=""> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> </button> </a>