Filters








18,994 Hits in 4.7 sec

Software Directed Issue Queue Power Reduction

T.M. Jones, M.F.P. O'Boyle, J. Abella, A. Gonzalez
11th International Symposium on High-Performance Computer Architecture  
In this paper we present a novel software assisted approach to power reduction where the processor dynamically resizes the issue queue based on compiler analysis.  ...  Using a simplistic scheme we achieve 47% dynamic and 31% static power savings in the issue queue with only a 2.2% performance loss.  ...  This paper proposes an entirely different approach -software directed issue queue control.  ... 
doi:10.1109/hpca.2005.32 dblp:conf/hpca/JonesOAG05 fatcat:fwbwm5fxrfdjxpe6utexchboy4

Low-power, low-complexity instruction issue using compiler assistance

Madhavi G. Valluri, Lizy K. John, Kathryn S. McKinley
2005 Proceedings of the 19th annual international conference on Supercomputing - ICS '05  
At the micro-architecture-level, we propose a novel issue queue that exploits the varying dynamic scheduling requirement of basic blocks to lower the power dissipation and complexity of the dynamic issue  ...  This paper develops a cooperative hardware/software technique to reduce complexity and energy consumption of the issue logic.  ...  This paper presents a cooperative hardware/software technique to mitigate the power and complexity bottlenecks in the issue logic.  ... 
doi:10.1145/1088149.1088177 dblp:conf/ics/ValluriJM05 fatcat:p7so5hj7fbglldm3br67prfxhi

Design and implementation Raspberry Pi-based omni-wheel mobile robot

Kirill Krinkin, Elena Stotskaya, Yury Stotskiy
2015 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)  
This paper describes hardware design and control software for small size omni-directional wheels robot implemented for indoor testing SLAM algorithms.  ...  Nowadays simultaneous localization and mapping (SLAM) algorithms are being tested at least in two phases: software simulation and real hardware platform testing.  ...  For rotation case, all motors are powered to turn in the same direction.  ... 
doi:10.1109/ainl-ismw-fruct.2015.7382967 fatcat:kobwoaf4trh6hbk4dfwf4mumqu

Explicit Communication and Synchronization in SARC

Manolis Katevenis, Vassilis Papaefstathiou, Stamatis Kavadias, Dionisios Pnevmatikatos, Federico Silla, Dimitrios Nikolopoulos
2010 IEEE Micro  
Command buffers are used to issue remote (write or read) DMA operations; counters and queues are used for synchronization, including RDMA completion detection, notifications, and waiting for events.  ...  Within this scratchpad space, software can allocate special areas (as many as it wishes) that behave as command buffers, counters, or queues, with event response capabilities.  ...  The latter effect offers a reduction in the total NoC power that ranges from 15% to 30% when compared with plain caches; in contrast prefetching results in increased NoC power consumption.  ... 
doi:10.1109/mm.2010.77 fatcat:jzsphc2sqrgpfh6rxdgvuswv5y

A single-chip multiprocessor for smart terminals

M. Edahiro, S. Matsushita, M. Yamashina, N. Nishi
2000 IEEE Micro  
Intelligence should be another key issue in the new millennium.  ...  To attain these features, low-power and high-performance microprocessors are indispensable. Obviously, lower power enhances mobility by enabling longer battery lives.  ...  Also, on-chip power switches are highly effective for standby power reduction. processing element (PE). Both pipelines also have 32-bit or 16-bit × 2 SIMD ALUs.  ... 
doi:10.1109/mm.2000.865862 fatcat:qexgecaucna2hgb3wkapvp2k24

OpenMP on Networks of Workstations

Honghui Lu, Y.C. Hu, W. Zwaenepoel
1998 Proceedings of the IEEE/ACM SC98 Conference  
applications (ASCI Sweep3d, NAS 3D-FFT, SPLASH-2 Water, QSORT, and TSP) exhibiting various styles of parallelization, including pipelined execution, data parallelism, coarse-grained parallelism, and task queues  ...  The measurements show little di erence between OpenMP and hand-coded software DSM, but both are still lagging behind MPI.  ...  The reduction directive identi es reduction variables. According to the standard, reduction variables must be scalar, but we extend the standard to include arrays.  ... 
doi:10.1109/sc.1998.10001 dblp:conf/sc/LuHZ98 fatcat:27myqxi6end4hprynj5sltezha

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Victor Garcia, Alejandro Rico, Carlos Villavieja, Paul Carpenter, Nacho Navarro, Alex Ramirez
2016 International journal of parallel programming  
Among software controlled prefetching we find a wide variety of schemes, including runtimedirected prefetching and more specifically runtime-directed block prefetching.  ...  As a result, we also achieve a reduction of up to 18% and 3% on average in energy-to-solution.  ...  Runtime-directed prefetching however brings only data known to be needed, and the additional hardware required to support our software block prefetcher has an almost negligible cost in area and power.  ... 
doi:10.1007/s10766-016-0431-8 fatcat:cat3hjvdqjb73oxdhf5dwirvh4

Flexible architectural support for fine-grain scheduling

Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
2010 SIGPLAN notices  
We propose asynchronous direct messages (ADM), a simple architectural extension that provides direct exchange of asynchronous, short messages between threads in the CMP without going through the memory  ...  This paper presents a combined hardware-software approach to build fine-grain schedulers that retain the flexibility of software schedulers while being as fast and scalable as hardware ones.  ...  For instance, hashjoin benefits from directed hierarchical stealing, cg requires fast reductions, and other applications need more complex queuing policies (e.g. a priority queue).  ... 
doi:10.1145/1735971.1736055 fatcat:tfdimqph4ja75dkfmlrsrmvcna

Flexible architectural support for fine-grain scheduling

Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
2010 SIGARCH Computer Architecture News  
We propose asynchronous direct messages (ADM), a simple architectural extension that provides direct exchange of asynchronous, short messages between threads in the CMP without going through the memory  ...  This paper presents a combined hardware-software approach to build fine-grain schedulers that retain the flexibility of software schedulers while being as fast and scalable as hardware ones.  ...  For instance, hashjoin benefits from directed hierarchical stealing, cg requires fast reductions, and other applications need more complex queuing policies (e.g. a priority queue).  ... 
doi:10.1145/1735970.1736055 fatcat:yh7f7bisnnbr5apzkj5vfyjjvy

Flexible architectural support for fine-grain scheduling

Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
2010 Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10  
We propose asynchronous direct messages (ADM), a simple architectural extension that provides direct exchange of asynchronous, short messages between threads in the CMP without going through the memory  ...  This paper presents a combined hardware-software approach to build fine-grain schedulers that retain the flexibility of software schedulers while being as fast and scalable as hardware ones.  ...  For instance, hashjoin benefits from directed hierarchical stealing, cg requires fast reductions, and other applications need more complex queuing policies (e.g. a priority queue).  ... 
doi:10.1145/1736020.1736055 dblp:conf/asplos/SanchezYK10 fatcat:6bjfeqtuabecbhz644sejgaojy

NUMA-aware graph mining techniques for performance and energy efficiency

Michael Frasca, Kamesh Madduri, Padma Raghavan
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
We investigate dynamic methods to improve the power and performance profiles of large irregular applications on modern multi-core systems.  ...  Memory issues already dominate high-performance software design [8] , but the manycore trend complicates this aspect.  ...  Power and Energy Consumption We now consider the potential energy savings of our dynamic optimizations. There are two possible sources of energy reduction, i.e., static power and dynamic power.  ... 
doi:10.1109/sc.2012.81 dblp:conf/sc/FrascaMR12 fatcat:6gowu7zfkrao5aolkyt5y2pbam

Montecito: A Dual-Core, Dual-Thread Itanium Processor

C. McNairy, R. Bhatia
2005 IEEE Micro  
Figure 6 shows how Foxton technology can provide a near cubic reduction in power when needed.  ...  The software can then indicate when the current thread does not need core resources. • Low-power mode.  ...  In future endeavors, McNairy plans to focus on performance; reliability, availability, and serviceability; and system interface issues in Itanium processor design.  ... 
doi:10.1109/mm.2005.34 fatcat:xnjsaln7ejhbnhpwct3nfmvkha

Compiler Directed Issue Queue Energy Reduction [chapter]

Timothy M. Jones, Michael F. P. O'Boyle, Jaume Abella, Antonio González
2011 Lecture Notes in Computer Science  
This paper presents a novel approach to energy reduction that uses compiler analysis communicated to the hardware, allowing the processor to dynamically resize the issue queue, fitting it to the available  ...  A simplistic scheme achieves 31% dynamic and 33% static energy savings in the issue queue with a 7.2% performance loss.  ...  This paper proposes an entirely different approach -software directed issue queue control.  ... 
doi:10.1007/978-3-642-24568-8_3 fatcat:de2ci5dsyfbmfe3ajio2nlbqxq

Architectural Support for Task Dependence Management with Flexible Software Scheduling

Emilio Castillo, Lluc Alvarez, Miquel Moreto, Marc Casas, Enrique Vallejo, Jose Luis Bosque, Ramon Beivide, Mateo Valero
2018 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)  
Compared to a runtime system fully implemented in hardware, TDM achieves an average speedup of 4.2% with 7.3x less area requirements and significant EDP reductions.  ...  In addition, five different software schedulers are evaluated with TDM, illustrating its flexibility and performance gains.  ...  DMU Area and Power Overhead The components of the DMU have a negligible effect on the power consumption, less than 0.01% of the total power.  ... 
doi:10.1109/hpca.2018.00033 dblp:conf/hpca/CastilloAMCVBBV18 fatcat:reoxvu4tqfhlvd6qcq6af66yd4

Exploiting dynamic transaction queue size in scalable memory systems

Mario Donato Marino, Tien-Hsiung Weng, Kuan-Ching Li
2017 Soft Computing - A Fusion of Foundations, Methodologies and Applications  
namely as shallower transaction queues, which provides an opportunity to power saving.  ...  reduction compared to systems with 1-2ntries.  ...  Related Work Initial evaluation of transaction queue reduction in terms of power and performance impact was presented in [5] .  ... 
doi:10.1007/s00500-016-2470-x fatcat:cthoj7tnd5gmlofmqod56xaehu
« Previous Showing results 1 — 15 out of 18,994 results