Generational stack collection and profile-driven pretenuring

Perry Cheng, Robert Harper, Peter Lee
1998 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation - PLDI '98  
1 Introduction This paper presents two techniques for improving garbage collection performance: generational stack collection and profile-driven pretenuring. The first is applicable to stackbased implementations of functional languages while the second is useful for any generational collector. We have implemented both techniques in a generational collector used by the TIL compiler (Tarditi, Morrisett, Cheng, Stone, Harper, and Lee 1996) , and have observed decreases in garbage collection times
more » ... f as much as 70% and 30%, respectively. Functional languages encourage the use of recursion which can lead to a long chain of activation records. When a collection occurs, these activation records must be scanned for roots. We show that scanning many activation records can take so long as to become the dominant cost of garbage collection. However, most deep stacks unwind very infrequently, so most of the root information obtained from the stack remains unchanged across successive garbage collections. Generationalstock collectiongreatly reduces the stack scan cost by reusing information from previous scans. Generational techniques have been successful in reducing the cost of garbage collection (Ungar 1984) . Various complex heap arrangements and temu-ing policies have been proposed to increase the effectiveness of generational techniques by reducing the cost and frequency of scanning and copying. In contrast, we show that by using profile information to make lifetime predictions, pretenuring can avoid copying data altogether. In essence, this technique uses a refinement of the generational hypothesis (most data die young) with a locality principle concerning the age of data: most allocations sites produce data that immediately dies, while a few allocation sites consistently produce data that survives many collections. Garbage collection is a technique for automatic memory management whereby the programmer is freed from explicit deallocation of heap storage (McCarthy 1960; Knuth 1969; Wilson 1994) . Copying garbage collectors reclaim space in two steps: scanning the stack for roots and then copying data reachable from these roots into an unused area of memory. The area vacated by the live data is known to contain only garbage and may be reclaimed. A simple kind of copying garbage collector is the semispace collector (Fenichel and Yochelson 1969) using Cheney's algorithm (Cheney 1970). Unfortunately, semispace collectors cannot usually attain efficient memory usage and good performance.(Ungar 1984) Using the observation that most objects die quickly (Ungar 1984), generational collectors can arrange heap areas and schedule collections to improve performance. Generational collection successfully reduces the cost of copying data. However, for programs with deep call chains, the cost of scanning the stack for roots can be high. In our study, we observe that root processing can take up to 70% of the total garbage collection cost. Since most deep stacks are not frequently unwound ( Table 2) , most of the old stack frames are unchanged across successive collections. If we can determine which stack frames are unchanged, then the cost of root scanning can be reduced by reusing the information from the previous collection. This technique, called generational stack collection, is like generational garbage collection in that old stack frames are "tenured" to reduce processing frequency. Generational techniques work by dividing the heap into different regions called generations. Objects that survive initial minor collections of the nursery (the first generation) are more likely to survive many more collections. These objects are promoted into areas that are less frequently collected. The advantage is that if the collections of the older areas are sufficiently delayed, then a large fraction of these objects will have died, making the collection worthwhile. However, longlived objects are typically copied several times before they are tenured. Multiple generations can make the tenuring prediction more accurate but could cause even more copying of the data that survives. An alternative approach to using runtime per-object predictions is to classify objects based on their allocation site and use profile results to predict lifetimes. This technique can yield information concerning the predicted lifetime of objects before the final execution. Its success relies on the information returned by heap profiling. If, as we later show, an allocation site is a good predictor 0 1996 ACM 0-89791.9874/98/0006...(6.00 162
doi:10.1145/277650.277718 dblp:conf/pldi/ChengHL98 fatcat:4kjsovyrdfecdp3htqc33gw2c4