Flexible Decoupled Transactional Memory Support

Arrvindh Shriraman, Sandhya Dwarkadas, Michael L. Scott
<span title="">2008</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2kokby3zfzen3bd62o6cskolxq" style="color: black;">2008 International Symposium on Computer Architecture</a> </i> &nbsp;
A high-concurrency Transactional memory (TM) implementation needs to track concurrent accesses, buffer speculative updates, and manage conflicts. We propose that the requisite hardware mechanisms be decoupled from one another. Decoupling (a) simplifies hardware development, by allowing mechanisms to be developed independently; (b) enables software to manage these mechanisms and control policy (e.g., conflict management strategy and laziness of conflict detection); and (c) makes it easier to use
more &raquo; ... the hardware for purposes other than TM. We present a system, FlexTM (FLEXible Transactional Memory), that employs three decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets; per-thread conflict summary tables, which identify the threads with which conflicts have occurred; and a lazy versioning mechanism, which maintains the speculative updates in the local cache and employs a thread-private buffer (in virtual memory) only in the rare event of an overflow. The conflict summary tables allow lazy conflict management to occur locally, with no global arbitration (they also support eager management). All three mechanisms are kept software-accessible, to enable virtualization and to support transactions of arbitrary length. In experiments with a prototype on the Simics/GEMS testbed, FlexTM provides a 5× speedup over high-quality software TM, with no loss in policy flexibility. Our analysis highlights the importance of lazy conflict detection, which maximizes concurrency and helps to ensure forward progress. Eager detection provides better overall system utilization in a mixed-programming environment. We also present a preliminary case study in which FlexTM components aid in the development of a tool to detect memory-related bugs. arbitrating between conflicting transactions and deciding which should abort. Pessimistic (eager) systems perform both conflict detection and conflict management as soon as possible. Optimistic (lazy) systems delay conflict management until commit time (though they may detect conflicts earlier). TM systems must also perform version management, either buffering new values in private locations (a redo log) and making them visible at commit time, or buffering old values (an undo log) and restoring them on aborts. In the taxonomy of Moore et al. [27] , undo logs are considered an orthogonal form of eagerness (they put updates in the "right" location optimistically); redo logs are considered lazy. The mechanisms required for conflict detection, conflict management, and version management can be implemented in hardware (HTM) [1, 14, 16, 27, 28] , software (STM) [11, 12, 15, 23, 29] , or some hybrid of the two (HyTM) [10, 18, 26, 35] . Full hardware systems are typically inflexible in policy, with fixed choices for eagerness of conflict management, strategies for conflict arbitration and back-off, and eagerness of versioning. Software-only systems are typically slow by comparison, at least in the common case. Several systems [6, 35, 40] have advocated decoupling of the hardware components required for TM, giving each a well-defined API that allows them to be implemented and invoked independently. Hill et al. [17] argue that decoupling makes it easier to refine an architecture incrementally. Shriraman et al. [34, 35] argue that decoupling helps to separate policy from mechanism, thereby enabling flexibility in the choice of policy. Both groups suggest that decoupling may allow TM components to be used for other, nontransactional purposes [17] [35, TR version]. Several papers have found performance pathologies with certain policy choices (eagerness of conflict management; arbitration and back-off strategy) in certain applications [4, 32, 35, 36] . RTM promotes policy flexibility by decoupling version management from conflict detection and management-specifically, by separating data and metadata, and performing conflict detection only on the latter. While RTM hardware provides a single mechanism for both conflict detection and management, software can choose (by controlling the timing of metadata inspection and updates) when conflicts are detected. Unfortunately, metadata management imposes significant software costs [35] . In this paper, we propose more fully decoupled hardware, allowing us to maintain the separation between version management and conflict management without the need for software-managed metadata. Specifically, our FlexTM (FLEXible Transactional Memory) system introduces conflict summary tables (CSTs) to concisely capture conflicts between transactions. It also uses Bloom filter signatures (as in Bulk [6] and LogTM-SE [40]) to track and summarize a transaction's read and write sets, and adapts the versioning system of RTM (programmable data isolation-PDI), extending it to directory-based coherence and adding a hardware-filled overflow mechanism. Though FlexTM relies on read and write signatures to maintain CSTs, the signatures are first-class objects, and can be used for other purposes as well. The CSTs, for their part, can be polled by software or configured to trigger a user-level handler when conflicts occur; this allows us to separate conflict detection from conflict management. In other words, while the hardware always detects conflicts immediately, software chooses when to notice, and what to do about it. FlexTM enables lazy conflict management without commit tokens [14] , broadcast of write sets [6, 14] , or ticket-based serialization [7] . It is, to our knowledge, the first hardware TM to implement lazy commits and aborts as entirely local operations, even with parallel commits in multiple threads. As in RTM, PDI buffers speculative writes in local (private) caches, allowing those caches to grow incoherent under software control. Rather than fall back to software-only TM in the event of overflow, however, FlexTM moves evicted speculative lines to a thread-private overflow table (OT) in virtual memory. Both signatures and CSTs are independent of the versioning system. Signatures, CSTs, and OTs are fully visible in software, and can be read and (at the OS level) written under software control. This allows us to virtualize these structures, extending transactions through context switches and paging. As in LogTM-SE, summary signatures capture the read and write sets of swapped-
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/isca.2008.17">doi:10.1109/isca.2008.17</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/isca/ShriramanDS08.html">dblp:conf/isca/ShriramanDS08</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kwyihhdwpbentfbnd4anoqzaei">fatcat:kwyihhdwpbentfbnd4anoqzaei</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808221629/http://cs.rochester.edu/users/grads/ashriram/publications/2007_TR_FlexTM.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/81/36/81366b99340eb8295787594ccf0d6e37d4ccf390.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/isca.2008.17"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>