Filters








5,120 Hits in 5.2 sec

Hardware transactional memory for GPU architectures

Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, Tor M. Aamodt
2011 Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11  
In this paper, we propose to solve these problems by extending GPUs to support transactional memory (TM).  ...  While threads within a CUDA block/OpenCL workgroup can communicate efficiently through an intra-core scratchpad memory, threads in different blocks can only communicate via global memory accesses.  ...  Transaction Rollback Many proposed HTMs checkpoint the architectural state of the hardware thread at the start of a transaction for restoration upon rollback.  ... 
doi:10.1145/2155620.2155655 dblp:conf/micro/FungSBA11 fatcat:dbp27jwhabddfgeih6rqc7fube

Models of Communication for Multicore Processors

Martin Schoeberl, Rasmus Bo Sorensen, Jens Sparso
2015 2015 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops  
Different forms of on-chip communication are supported by different hardware mechanism, e.g., shared caches with cache coherency protocols, core-tocore networks-on-chip, and shared scratchpad memories.  ...  In this paper we explore the different hardware mechanism for on-chip communication and how they support or favor different models of communication.  ...  ACKNOWLEDGMENT The work presented in this paper was partially funded by the Danish Council for Independent Research | Technology and Production Sciences under the project RTEMP, contract no. 12-127600  ... 
doi:10.1109/isorcw.2015.57 dblp:conf/isorc/SchoeberlSS15 fatcat:2xfammjbbnb5vkdm6zfd5wxefy

TMbox: A Flexible and Reconfigurable 16-Core Hybrid Transactional Memory System

Nehir Sonmez, Oriol Arcas, Otto Pflucker, Osman S. Unsal, Adri´n Cristal, Ibrahim Hur, Satnam Singh, Mateo Valero
2011 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines  
In this paper we present the design and implementation of TMbox: An MPSoC built to explore tradeoffs in multicore design space and to evaluate parallel programming proposals such as Transactional Memory  ...  For this paper we evaluate a 16-core Hybrid Transactional Memory implementation based on the TinySTM-ASF proposal on a Virtex-5 FPGA and we accelerate three benchmarks written to investigate TM.  ...  the Ministry of Science and Technology of Spain and the European Union (FEDER funds) under contract TIN2007-60625 and TIN2008-02055-E, by the European Network of Excellence on High-Performance Embedded Architecture  ... 
doi:10.1109/fccm.2011.44 dblp:conf/fccm/SonmezAPUCHSV11 fatcat:umxbd7mwknbnjboga3uypbanma

High-performance ethernet-based communications for future multi-core processors

Michael Schlansker, Norman P. Jouppi, Nagabhushan Chitlur, Erwin Oertli, Paul M. Stillwell, Linda Rankin, Dennis Bradford, Richard J. Carter, Jayaram Mudigonda, Nathan Binkert
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
The hardware and software architecture is designed with a number of key assumptions. Hardware must be simple for close integration in future multi-core processors.  ...  We envision a JNIC architecture that is suitable for most in-data-center communication needs.  ...  Information is pushed onto the transmit command queue using coherent memory transactions.  ... 
doi:10.1145/1362622.1362672 dblp:conf/sc/SchlanskerCOSRBCMBJ07 fatcat:b6nqg7loczb6bifp5cptihxjgy

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures

Tayo Oguntebi, Sungpack Hong, Jared Casper, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun
2010 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines  
FARM's coherent FPGA includes a cache and participates in coherence activities with the processors.  ...  We present the Flexible Architecture Research Machine (FARM), a hardware prototyping system based on an FPGA coherently connected to a multiprocessor system.  ...  Many of these architectures, such as the heterogeneous ones, are fundamentally different from existing hardware and difficult to accurately model using traditional simulators.  ... 
doi:10.1109/fccm.2010.41 dblp:conf/fccm/OguntebiHCBKO10 fatcat:g2h7kypl2zbgbl6gh2x4355qp4

Shared Memory in the Many-Core Age [chapter]

Stefan Nürnberger, Gabor Drescher, Randolf Rotta, Jörg Nolte, Wolfgang Schröder-Preikschat
2014 Lecture Notes in Computer Science  
These elementary operations will help in exploring and evaluating new memory models and consistency protocols.  ...  With the evolution toward fast networks of many-core processors, the design assumptions at the basis of software-level distributed shared memory (DSM) systems change considerably.  ...  With the transition to many-core architectures, the hardware evolved considerably and became more diverse and heterogeneous.  ... 
doi:10.1007/978-3-319-14313-2_30 fatcat:ah6h3rubebh4zj52w4wfnhaz2m

From Plasma to BeeFarm: Design Experience of an FPGA-Based Multicore Prototype [chapter]

Nehir Sonmez, Oriol Arcas, Gokhan Sayilar, Osman S. Unsal, Adrián Cristal, Ibrahim Hur, Satnam Singh, Mateo Valero
2011 Lecture Notes in Computer Science  
last few years both in the FPGA and the computer architecture communities.  ...  Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.  ...  One of the most attractive proposals for shared-memory CMPs has been the use of atomic instructions in Transactional Memory (TM), a new programming paradigm for deadlock-free execution of parallel code  ... 
doi:10.1007/978-3-642-19475-7_37 fatcat:eno4vzv2jrdqpjw6ytoqv56cdm

An efficient synchronization technique for multiprocessor systems on-chip

Matteo Monchiero, Gianluca Palermo, Cristina Silvano, Oreste Villa
2006 SIGARCH Computer Architecture News  
We suggest the architecture of the memory controller optimized to minimize synchronization overhead.  ...  variables and directory-based coherency protocol.  ...  In [14] the authors presents evaluation of several cache coherency schemes. They explore three scenarios, in the context of a shared memory MPSoC based on bus interconnect.  ... 
doi:10.1145/1147349.1147357 fatcat:glxyh2x3qbbodplpgu5icukz6q

The New Hardware Development Trend and the Challenges in Data Management and Analysis

Wei Pan, Zhanhuai Li, Yansong Zhang, Chuliang Weng
2018 Data Science and Engineering  
In this paper, we first introduce the development trend of the new hardware in computation, storage, and network dimensions.  ...  Recent hardware trends in these areas deeply affect data management and analysis applications.  ...  Many mature query processing techniques may fail in platforms with many-core processors.  ... 
doi:10.1007/s41019-018-0072-6 fatcat:kksuldstdvdohpkleq7suyq2ry

Using a configurable processor generator for computer architecture prototyping

Alex Solomatnikov, Amin Firoozshahian, Ofer Shacham, Zain Asgar, Megan Wachs, Wajahat Qadeer, Stephen Richardson, Mark Horowitz
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
Building hardware prototypes for computer architecture research is challenging.  ...  to successfully tape out an 8-core CMP chip with only a small group of designers.  ...  ACKNOWLEDGMENTS This work would have not been possible without great support and cooperation from many people at Tensilica: Chris Rowen, Dror Maydan, Bill Huffman, Nenad Nedeljkovic, David Heine, Govind  ... 
doi:10.1145/1669112.1669159 dblp:conf/micro/SolomatnikovFSAWQRH09 fatcat:64ix2vqwsjaxhl5abuqvrpqlr4

Exploring memory consistency for massively-threaded throughput-oriented processors

Blake A. Hechtman, Daniel J. Sorin
2013 SIGARCH Computer Architecture News  
MTTOPs differ from CPUs in many significant ways, including their ability to tolerate latency, their memory system organization, and the characteristics of the software they run.  ...  We compare implementations of various hardware consistency models for MTTOPs in terms of performance, energyefficiency, hardware complexity, and programmability.  ...  Other recent work has explored how to extend the scalability of hardware cache coherence to hundreds and even thousands of cores.  ... 
doi:10.1145/2508148.2485940 fatcat:jrbhg7prcrgx5kajihgfd7snxi

Exploring memory consistency for massively-threaded throughput-oriented processors

Blake A. Hechtman, Daniel J. Sorin
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
MTTOPs differ from CPUs in many significant ways, including their ability to tolerate latency, their memory system organization, and the characteristics of the software they run.  ...  We compare implementations of various hardware consistency models for MTTOPs in terms of performance, energyefficiency, hardware complexity, and programmability.  ...  Other recent work has explored how to extend the scalability of hardware cache coherence to hundreds and even thousands of cores.  ... 
doi:10.1145/2485922.2485940 dblp:conf/isca/HechtmanS13 fatcat:tuakbaweffg4hmekzomyqi2imm

Distributed Computing Column 58

Jennifer L. Welch
2015 ACM SIGACT News  
References Concluding Thoughts While the discussion above spans much of the history of transactional memory, and mentions many open questions, the coverage has of necessity been spotty, and the choice  ...  Most of all, my thanks and admiration to Maurice Herlihy for his seminal contributions, not only to transactional memory, but to nonblocking algorithms, topological analysis, and so many other aspects  ...  Barrelfish is an example of a multikernel in which each core runs a separate OS kernel, even when the cores operate in a single cache-coherent machine.  ... 
doi:10.1145/2789149.2789163 fatcat:ueub5knoofar3jee6plc763yni

Efficient Synchronization for Embedded On-Chip Multiprocessors

M. Monchiero, G. Palermo, C. Silvano, O. Villa
2006 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Two different architectures have been explored to prove that the proposed approach is effective independently from caches and coherence schemes adopted.  ...  By using this mechanism, we propose a spin lock implementation requiring a constant number of network transactions and memory accesses per lock acquisition.  ...  Finally, many thanks to E. Coffey for checking the final version of this paper.  ... 
doi:10.1109/tvlsi.2006.884147 fatcat:zb3s5gzdarf5tattdiioc6zxzy

Soft-error mitigation by means of decoupled transactional memory threads

Daniel Sánchez, Juan M. Cebrián, José M. García, Juan L. Aragón
2014 Distributed computing  
Based on a Hardware Transactional Memory architecture, LBRA executes redundant threads which communicate through a pairshared virtual memory log allocated in cache.  ...  To avoid the performance penalty inherent to this architecture, we propose to decouple their execution in different cores, solving the inter-core communication by means of a log buffer empowered by a simple  ...  With LBRA, we explore the use of a HTM (Hardware Transactional Memory) system to build a fault tolerant architecture.  ... 
doi:10.1007/s00446-014-0215-6 fatcat:vdvalpdcwrbt3ofopss4nncl4q
« Previous Showing results 1 — 15 out of 5,120 results