A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is
Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly determines program performance by making a classic space-time tradeoff that seeks to provide space efficiency, fast reclamation, and mutator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one objective. This paper describes a collectordoi:10.1145/1379022.1375586 fatcat:425brocskrbd5oy6saflivktty
more »... called mark-region, and introduces opportunistic defragmentation, which mixes copying and marking in a single pass. Combining both, we implement immix, a novel high performance garbage collector that achieves all three performance objectives. The key insight is to allocate and reclaim memory in contiguous regions, at a coarse block grain when possible and otherwise in groups of finer grain lines. We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25% on average across 20 benchmarks. As the mature space in a generational collector, immix matches or beats a highly tuned generational collector, e.g. it improves jbb2000 by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design.
General purpose garbage collectors have yet to combine short pause times with high throughput. For example, generational collectors can achieve high throughput. They have modest average pause times, but occasionally collect the whole heap and consequently incur long pauses. At the other extreme, concurrent collectors, including reference counting, attain short pause times but with significant performance penalties. This paper introduces a new hybrid collector that combines copying generationaldoi:10.1145/949343.949336 fatcat:okelralubndl5an4lmueww2cga
more »... ollection for the young objects and reference counting the old objects to achieve both goals. It restricts copying and reference counting to the object demographics for which they perform well. Key to our algorithm is a generalization of deferred reference counting we call Ulterior Reference Counting. Ulterior reference counting safely ignores mutations to select heap objects. We compare a generational reference counting hybrid with pure reference counting, pure marksweep, and hybrid generational mark-sweep collectors. This new collector combines excellent throughput, matching a high performance generational mark-sweep hybrid, with low maximum pause times.
We also thank Brendon Cahoon, Stephen Fink, David Grove, Michael Hind, Richard Jones, and Eliot Moss for their input and discussions. ...doi:10.1145/773039.512452 fatcat:qml7qvf3pjcrlgztbov6ofvto4
We present the design and implementation of a new garbage collection framework that significantly generalizes existing copying collectors. The Beltway framework exploits and separates object age and incrementality. It groups objects in one or more increments on queues called belts, collects belts independently, and collects increments on a belt in first-in-first-out order. We show that Beltway configurations, selected by command line options, act and perform the same as semi-space,doi:10.1145/543552.512548 fatcat:3rrzoeeprfafljserm3vftkcre
more »... and older-first collectors, and encompass all previous copying collectors of which we are aware. The increasing reliance on garbage collected languages such as Java requires that the collector perform well. We show that the generality of Beltway enables us to design and implement new collectors that are robust to variations in heap size and improve total execution time over the best generational copying collectors of which we are aware by up to 40%, and on average by 5 to 10%, for small to moderate heap sizes. New garbage collection algorithms are rare, and yet we define not just one, but a new family of collectors that subsumes previous work. This generality enables us to explore a larger design space and build better collectors.
This paper explores and quantifies garbage collection behavior for three whole heap collectors and generational counterparts: copying semi-space, mark-sweep, and reference counting, the canonical algorithms from which essentially all other collection algorithms are derived. Efficient implementations in MMTk, a Java memory management toolkit, in IBM's Jikes RVM share all common mechanisms to provide a clean experimental platform. Instrumentation separates collector and program behavior, anddoi:10.1145/1012888.1005693 fatcat:5rhotweg7vhwzo3fs5mxczikje
more »... rmance counters measure timing and memory behavior on three architectures. Our experimental design reveals key algorithmic features and how they match program characteristics to explain the direct and indirect costs of garbage collection as a function of heap size on the SPEC JVM benchmarks. For example, we find that the contiguous allocation of copying collectors attains significant locality benefits over free-list allocators. The reduced collection costs of the generational algorithms together with the locality benefit of contiguous allocation motivates a copying nursery for newly allocated objects. These benefits dominate the overheads of generational collectors compared with non-generational and no collection, disputing the myth that "no garbage collection is good garbage collection." Performance is less sensitive to the mature space collection algorithm in our benchmarks. However the locality and pointer mutation characteristics for a given program occasionally prefer copying or mark-sweep. This study is unique in its breadth of garbage collection algorithms and its depth of analysis.
Arrays are the ubiquitous organization for indexed data. Throughout programming language evolution, implementations have laid out arrays contiguously in memory. This layout is problematic in space and time. It causes heap fragmentation, garbage collection pauses in proportion to array size, and wasted memory for sparse and over-provisioned arrays. Because of array virtualization in managed languages, an array layout that consists of indirection pointers to fixed-size discontiguous memory blocksdoi:10.1145/1809028.1806649 fatcat:4famqi3efvbvlfcli2f326bfni
more »... can mitigate these problems transparently. This design however incurs significant overhead, but is justified when real-time deadlines and space constraints trump performance. This paper proposes z-rays, a discontiguous array design with flexibility and efficiency. A z-ray has a spine with indirection pointers to fixed-size memory blocks called arraylets, and uses five optimizations: (1) inlining the first N array bytes into the spine, (2) lazy allocation, (3) zero compression, (4) fast array copy, and (5) arraylet copy-on-write. Whereas discontiguous arrays in prior work improve responsiveness and space efficiency, z-rays combine time efficiency and flexibility. On average, the best z-ray configuration performs within 12.7% of an unmodified Java Virtual Machine on 19 benchmarks, whereas previous designs have two to three times higher overheads. Furthermore, language implementers can configure z-ray optimizations for various design goals. This combination of performance and flexibility creates a better building block for past and future array optimization.
Garbage collectors are exact or conservative. An exact collector identifies all references precisely and may move referents and update references, whereas a conservative collector treats one or more of stack, register, and heap references as ambiguous. Ambiguous references constrain collectors in two ways. (1) Since they may be pointers, the collectors must retain referents. (2) Since they may be values, the collectors cannot modify them, pinning their referents. We explore conservativedoi:10.1145/2714064.2660198 fatcat:qqr7iyokqfhkzpgyyaswar45ja
more »... rs for managed languages, with ambiguous stacks and registers. We show that for Java benchmarks they retain and pin remarkably few heap objects: <0.01% are falsely retained and 0.03% are pinned. The larger effect is collector design. Prior conservative collectors (1) use mark-sweep and unnecessarily forgo moving all objects, or (2) use mostly copying and pin entire pages. Compared to generational collection, overheads are substantial: 12% and 45% respectively. We introduce high performance conservative Immix and reference counting (RC). Immix is a mark-region collector with fine linegrain pinning and opportunistic copying of unambiguous referents. Deferred RC simply needs an object map to deliver the first conservative RC. We implement six exact collectors and their conservative counterparts. Conservative Immix and RC come within 2 to 3% of their exact counterparts. In particular, conservative RC Immix is slightly faster than a well-tuned exact generational collector. These findings show that for managed languages, conservative collection is compatible with high performance.
Memory safety defends against inadvertent and malicious misuse of memory that may compromise program correctness and security. A critical element of memory safety is zero initialization. The direct cost of zero initialization is surprisingly high: up to 12.7%, with average costs ranging from 2.7 to 4.5% on a high performance virtual machine on IA32 architectures. Zero initialization also incurs indirect costs due to its memory bandwidth demands and cache displacement effects. Existing virtualdoi:10.1145/2076021.2048092 fatcat:tai7u7rcczfyxa5yqleee2w7py
more »... chines either: a) minimize direct costs by zeroing in large blocks, or b) minimize indirect costs by zeroing in the allocation sequence, which reduces cache displacement and bandwidth. This paper evaluates the two widely used zero initialization designs, showing that they make different tradeoffs to achieve very similar performance. Our analysis inspires three better designs: (1) bulk zeroing with cache-bypassing (non-temporal) instructions to reduce the direct and indirect zeroing costs simultaneously, (2) concurrent non-temporal bulk zeroing that exploits parallel hardware to move work off the application's critical path, and (3) adaptive zeroing, which dynamically chooses between (1) and (2) based on available hardware parallelism. The new software strategies offer speedups sometimes greater than the direct overhead, improving total performance by 3% on average. Our findings invite additional optimizations and microarchitectural support.
Programs sometimes crash due to unusable values, for example, when Java and C# programs dereference null pointers and when C and C++ programs use undefined values to affect program behavior. A stack trace produced on such a crash identifies the effect of the unusable value, not its cause, and is often not much help to the programmer. This paper presents efficient origin tracking of unusable values; it shows how to record where these values come into existence, correctly propagate them, anddoi:10.1145/1297105.1297057 fatcat:5txotam6ifhjli6cahcnbis3oe
more »... t them if they cause an error. The key idea is value piggybacking: when the original program stores an unusable value, value piggybacking instead stores origin information in the spare bits of the unusable value. Modest compiler support alters the program to propagate these modified values through operations such as assignments and comparisons. We evaluate two implementations: the first tracks null pointer origins in a JVM, and the second tracks undefined value origins in a memorychecking tool built with Valgrind. These implementations show that origin tracking via value piggybacking is fast and often useful, and in the Java case, has low enough overhead for use in a production environment.
Developers and architects spend a lot of time trying to understand and eliminate performance problems. Unfortunately, the root causes of many problems occur at a fine granularity that existing continuous profiling and direct measurement approaches cannot observe. This paper presents the design and implementation of SHIM, a continuous profiler that samples at resolutions as fine as 15 cycles; three to five orders of magnitude finer than current continuous profilers. SHIM's fine-graindoi:10.1145/2872887.2750401 fatcat:ssxduqhuhfdvhml4a3od74ltce
more »... reveal new behaviors, such as variations in instructions per cycle (IPC) within the execution of a single function. A SHIM observer thread executes and samples autonomously on unutilized hardware. To sample, it reads hardware performance counters and memory locations that store software state. SHIM improves its accuracy by automatically detecting and discarding samples affected by measurement skew. We measure SHIM's observer effects and show how to analyze them. When on a separate core, SHIM can continuously observe one software signal with a 2% overhead at a~1200 cycle resolution. At an overhead of 61%, SHIM samples one software signal on the same core with SMT at a~15 cycle resolution. Modest hardware changes could significantly reduce overheads and add greater analytical capability to SHIM. We vary prefetching and DVFS policies in case studies that show the diagnostic power of fine-grain IPC and memory bandwidth results. By repurposing existing hardware, we deliver a practical tool for fine-grain performance microscopy for developers and architects.
Dynamic software updating (DSU) systems eliminate costly downtime by dynamically fixing bugs and adding features to executing programs. Given a static code patch, most DSU systems construct runtime code changes automatically. However, a dynamic update must also specify how to change the running program's execution state, e.g., the stack and heap, to make it compatible with the new code. Constructing such state transformations correctly and automatically remains an open problem. This paperdoi:10.1145/2398857.2384636 fatcat:rmn5ca3edbcsfhldlgkyhwl4jq
more »... ts a solution called Targeted Object Synthesis (TOS). TOS first executes the same tests on the old and new program versions separately, observing the program heap state at a few corresponding points. Given two corresponding heap states, TOS matches objects in the two versions using key fields that uniquely identify objects and correlate old and new-version objects. Given example object pairs, TOS then synthesizes the simplest-possible function that transforms an old-version object to its new-version counterpart. We show that TOS is effective on updates to four open-source server programs for which it generates non-trivial transformation functions that use conditionals, operate on collections, and fix memory leaks. These transformations help programmers understand their changes and apply dynamic software updates.
and Nosofsky (1993), final transfer block; 4 = McKinley and Nosofsky (1993), first transfer block. ... McKinley and Nosofsky (1993) conducted a similar analysis. They too observed a wide variety of generalization profiles. ...doi:10.1037/0033-295x.101.1.53 pmid:8121960 fatcat:whavc3jj3vgd5pfpstyon7jalm
Universal picking (UP), or reliable robot grasping of a diverse range of novel objects from heaps, is a grand challenge for e-commerce order fulfillment, manufacturing, inspection, and home service robots. Optimizing the rate, reliability, and range of UP is difficult due to inherent uncertainty in sensing, control, and contact physics. This paper explores "ambidextrous" robot grasping, where two or more heterogeneous grippers are used. We present Dexterity Network (Dex-Net) 4.0, a substantialdoi:10.1126/scirobotics.aau4984 pmid:33137754 fatcat:4gu6slfd5bfddbx2ivxi6mvica
more »... xtension to previous versions of Dex-Net that learns policies for a given set of grippers by training on synthetic datasets using domain randomization with analytic models of physics and geometry. We train policies for a parallel-jaw and a vacuum-based suction cup gripper on 5 million synthetic depth images, grasps, and rewards generated from heaps of three-dimensional objects. On a physical robot with two grippers, the Dex-Net 4.0 policy consistently clears bins of up to 25 novel objects with reliability greater than 95% at a rate of more than 300 mean picks per hour.
As improvements in processor speed continue to outpace improvements in cache and memory speed, poor locality increasingly degrades performance. Because copying garbage collectors move objects, they have an opportunity to improve locality. However, no static copying order is guaranteed to match program traversal orders. This paper introduces online object reordering (OOR) which includes a new dynamic, online class analysis for Java that detects program traversal patterns and exploits them in adoi:10.1145/1035292.1028983 fatcat:6czk6zohybgqhlwj7l55dphkdu
more »... pying collector. OOR uses runtime method sampling that drives just-in-time (JIT) compilation. For each hot (frequently executed) method, OOR analysis identifies the hot field accesses. At garbage collection time, the OOR collector then copies referents of hot fields together with their parent. Enhancements include static analysis to exclude accesses in cold basic blocks, heuristics that decay heat to respond to phase changes, and a separate space for hot objects. The overhead of OOR is on average negligible and always less than 2% on Java benchmarks in Jikes RVM with MMTk. We compare program performance of OOR to static class-oblivious copying orders (e.g., breadth and depth first). Performance variation due to static orders is often low, but can be up to 25%. In contrast, OOR matches or improves upon the best static order since its historybased copying tunes memory layout to program traversal.
We are sure our readers will all be delighted and profited by reading the letter which Justice Stephen J. ... fellow countrymen. x u With personal esteem and sincere best wishes for your contentment and happiness during the period of rest Vhich you have so well earned, I am, dear sir, very truly yours, WILLIAM McKINLEY ...fatcat:442ul5dcavbvla6bumoofgzpkq
« Previous Showing results 1 — 15 out of 5,640 results