Filters








1,810 Hits in 3.8 sec

Evaluating the use of register queues in software pipelined loops

G.S. Tyson, M. Smelyanskiy, E.S. Davidson
2001 IEEE transactions on computers  
Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers.  ...  RQs combine the major aspects of existing rotating register file and register connection techniques to generate efficient software pipeline schedules.  ...  The authors also thank Bob Rau and Alexandre Eichenberger for providing the loop kernels used in this study.  ... 
doi:10.1109/12.946998 fatcat:rsl6b7hforg5zbnxpkrsk2k34u

Evaluating the use of register queues in software pipelined loops

G.S. Tyson, M. Smelyanskiy, E.S. Davidson
2001 IEEE transactions on computers  
Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers.  ...  RQs combine the major aspects of existing rotating register file and register connection techniques to generate efficient software pipeline schedules.  ...  The authors also thank Bob Rau and Alexandre Eichenberger for providing the loop kernels used in this study.  ... 
doi:10.1109/tc.2001.947006 fatcat:2db3qaphs5fmhfzo66flhxqyyi

Distributed modulo scheduling

M.M. Fernandes, J. Llosa, N. Topham
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler.  ...  However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a large number of functional units.  ...  We have shown in [5] that loop variant lifetimes produced by a modulo scheduled loop can be allocated to a queue register file, resulting in some advantages over a conventional RF.  ... 
doi:10.1109/hpca.1999.744349 dblp:conf/hpca/FernandesLT99 fatcat:n7ughiyq4vewbks52777hlhwaa

GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

Fumihiko INO, Shinta NAKAGAWA, Kenichi HAGIHARA
2013 IEICE transactions on information and systems  
We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.  ...  In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs.  ...  Acknowledgments This study was partly supported by the JSPS KAKENHI Grant Number 23300007, 23700057, and the JST CREST program "An Evolutionary Approach to Construction of a Software Development Environment  ... 
doi:10.1587/transinf.e96.d.2604 fatcat:zm6ddxh6xnckjkpqq7hm2dvbna

Mobile Ecosystem Driven Dynamic Pipeline Adaptation for Low Power [chapter]

Garo Bournoutian, Alex Orailoglu
2015 Lecture Notes in Computer Science  
Instructions are dispatched to one of the appropriate issue queues, and all issue queues are then scanned in parallel to identify instructions ready for execution.  ...  The proposed architecture will monitor the run-time execution behavior in order to enable only those pipeline resources that are currently needed, allowing the system to rapidly respond to changing resource  ...  A software-assisted approach to dynamically resizing the issue queue was presented in [9] . Compile-time analysis provides information on the required number of issue queue entries.  ... 
doi:10.1007/978-3-319-16086-3_7 fatcat:52le4ek2tbbgbas2oaisjmr7c4

Can dataflow subsume von Neumann computing?

R. S. Nikhil
1989 Proceedings of the 16th annual international symposium on Computer architecture - ISCA '89  
We compare our approach to existing MIMD machines and to other dataflow machines.  ...  Starting with a simple, '%ISC-like" instruction set, we show how to change the underlying processor organization to make it multithreaded.  ...  Funding for this work is provided in part by the Advanced Research Projects Agency of the Department of Defense under the Office of Naval Research contract N00014-84-K-0099.  ... 
doi:10.1145/74925.74955 dblp:conf/isca/Nikhil89 fatcat:3jk76zy7drdazacxaibfh3wwge

Can dataflow subsume von Neumann computing?

R. S. Nikhil
1989 SIGARCH Computer Architecture News  
We compare our approach to existing MIMD machines and to other dataflow machines.  ...  Starting with a simple, '%ISC-like" instruction set, we show how to change the underlying processor organization to make it multithreaded.  ...  Funding for this work is provided in part by the Advanced Research Projects Agency of the Department of Defense under the Office of Naval Research contract N00014-84-K-0099.  ... 
doi:10.1145/74926.74955 fatcat:647mqk7sozfavikzz4ybdc2hui

Active messages

Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, Klaus Erik Schauser
1998 25 years of the international symposia on Computer architecture (selected papers) - ISCA '98  
In Sections 2 and 3 we show two programming models where the programmer and compiler, respectively. have conuol over communication pipelining.  ...  Active Messages. show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexibility.  ...  and a mini nCUBE/2 configuration to reboot at leisure.  ... 
doi:10.1145/285930.286002 dblp:conf/isca/EickenCGS98 fatcat:b7w4zsqhk5gurl3qyrilyyb5ue

Innovative Devops for Artificial Intelligence

R. Ciucu, F.C. Adochiei, Ioana-Raluca Adochiei, F. Argatu, G.C. Seriţan, B. Enache, S. Grigorescu, Violeta Vasilica Argatu
2019 The Scientific Bulletin of Electrical Engineering Faculty  
Our architecture uses key-based values to store specific graphs and datasets into memory for fast deployment and model training, furthermore leveraging the need for manual data reduction in the drafting  ...  In this article, we cover high performance computing concepts such as swarming, GPU resource management for model implementation in production environments with emphasis on standardized development to  ...  Building the service as a request handler ensures that the pipeline between each resource can be indirect queried and the response time can be measured to ensure the lifetime of the container.  ... 
doi:10.1515/sbeef-2019-0011 fatcat:fhqujuhhhzbgpaolgpp2zznftm

Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip

M. Ruggiero, A. Guerri, D. Bertozzi, F. Poletti, M. Milano
2006 Proceedings of the Design Automation & Test in Europe Conference  
The optimizer implements an efficient and exact approach to allocation and scheduling based on problem decomposition.  ...  This paper proposes a complete allocation and scheduling framework, where an MPSoC virtual platform is used to accurately derive input parameters, validate abstract models of system components and assess  ...  Pipelining is a well known workload allocation policy in the signal processing domain. An overview of algorithms for scheduling pipelined task graphs is presented in [6] .  ... 
doi:10.1109/date.2006.243950 dblp:conf/date/RuggieroGBPM06 fatcat:q46dsgmdkvgazigzyk74zcohki

Hardware-modulated parallelism in chip multiprocessors

Julia Chen, Philo Juang, Kevin Ko, Gilberto Contreras, David Penry, Ram Rangan, Adam Stoler, Li-Shiuan Peh, Margaret Martonosi
2005 SIGARCH Computer Architecture News  
In particular, our simulations motivated the need for hardware support, showing that the large thread management overheads of current run-time software systems can lead to up to 6.5X slowdown.  ...  The software layer is encouraged to expose large amounts of multi-granular, heterogeneous parallelism.  ...  This work is supported in part by the MARCO Gigascale Systems Research Center, NSF grants CNS-0410937 and CNS-0305617.  ... 
doi:10.1145/1105734.1105742 fatcat:d5iferqj5fghrmndyw6b4ill6y

Optimus

Amir Hormati, Manjunath Kudlur, Scott Mahlke, David Bacon, Rodric Rabbah
2008 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems - CASES '08  
Optimus compiles programs written in a high level streaming language to either software or hardware implementations.  ...  Optimus thus allows software developers who lack deep hardware design expertise to transparently leverage the advantages of hardware customization without crossing the semantic gap between high level languages  ...  architectures can offer enormous flexibility and adaptability in the face of rapidly changing software standards and customer needs.  ... 
doi:10.1145/1450095.1450105 dblp:conf/cases/HormatiKMBR08 fatcat:okpqldj4cbc5xdea4ck5de5ck4

An architectural framework for providing reliability and security support

N. Nakka, Z. Kalbarczyk, R.K. Iyer, J. Xu
2004 International Conference on Dependable Systems and Networks, 2004  
The detection mechanisms described here in detail are: (1) the Memory Layout Randomization (MLR) module, which randomizes the memory layout of a process in order to foil attackers who assume a fixed system  ...  layout, (2) the Data Dependency Tracking (DDT) module, which tracks the dependencies among threads of a process and maintains checkpoints of shared memory pages in order to rollback the threads when an  ...  Acknowledgements This work was supported in part by Gigascale Systems Research Center (GSRC) and in part by NSF grant ACI-0121658 ITR/AP. We thank F. Baker and T.  ... 
doi:10.1109/dsn.2004.1311929 dblp:conf/dsn/NakkaKIX04 fatcat:prj26iaqcba53ohgjrjxeuw2o4

A software WiMAX medium access control layer using massively multithreaded processors

M. Chetlur, U. Devi, P. Dutta, P. Gupta, L. Chen, Z. Zhu, S. Kalyanaraman, Y. Lin
2010 IBM Journal of Research and Development  
The implementation consists of separate threads in the data and control planes, and thread coordination through concurrent data structures to enable multithreading in both the uplink and downlink data  ...  This paper presents a multithreaded software implementation of the Worldwide Interoperability for Microwave Access (WiMAXi) medium access control (MAC) layer and its performance results on massively multithreaded  ...  In every frame, the scheduler determines the allocation to each service flow based on its QoS class and its backlog in the SDU queue.  ... 
doi:10.1147/jrd.2009.2037681 fatcat:4xt7ncmdyngr3ehzndhqhpewwq

A Fast and Accurate Technique for Mapping Parallel Applications on Stream-Oriented MPSoC Platforms with Communication Awareness

Martino Ruggiero, Alessio Guerri, Davide Bertozzi, Michela Milano, Luca Benini
2007 International journal of parallel programming  
This paper proposes a complete allocation and scheduling framework, and deploys an MPSoC virtual platform to validate the accuracy of modelling assumptions.  ...  The two solvers interact by means of an iterative procedure which has been proven to converge to the optimal solution.  ...  Fig. 1 . 1 Message-oriented distributed memory architecture. Fig. 2 . 2 (a) Bus allocation in a unary model; (b) Bus allocation in a coarse-grain additive model.  ... 
doi:10.1007/s10766-007-0032-7 fatcat:ihh4hsjbvvehtn3l5kqeq52p4a
« Previous Showing results 1 — 15 out of 1,810 results