Filters








13,361 Hits in 4.9 sec

Determining the idle time of a tiling: new results

F. Desprez, J. Dongarra, F. Rastello, Y. Robert
Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques  
We build upon recent results by H ogsted, Carter, and Ferrante 12], who aim at determining the cumulated idle time spent b y all processors while executing the partitioned (tiled) computation domain.  ...  More precisely, w e p r o vide an accurate solution for all values of the rise parameter that relates the shape of the iteration space to that of the tiles, and for all possible distributions of the tiles  ...  One important m o t i v ation for determining the idle tile of a timing in 12] w as, in fact, to demonstrate that such an idle time can have a signi cant impact on real performance for a large application  ... 
doi:10.1109/pact.1997.644026 dblp:conf/IEEEpact/DesprezDRR97 fatcat:lrq3detrgvaxxetdmshhynska4

An optimal scheduling scheme for tiling in distributed systems

Konstantinos Kyriakopoulos, Anthony T. Chronopoulos, Lionel Ni
2007 2007 IEEE International Conference on Cluster Computing  
The problem lies in the processor idle time which occurs during the beginning and final stages of the execution.  ...  We have tested the proposed scheme on a dedicated and homogeneous cluster of workstations and we verified that it significantly improves execution times over scheduling using traditional tiling. 1-4244  ...  ACKNOWLEDGEMENTS This work was supported in part by the National Science Foundation under grant CCR-0312323.  ... 
doi:10.1109/clustr.2007.4629240 dblp:conf/cluster/KyriakopoulosCN07 fatcat:c5oqsqciqzeh7k2knutxp4r4l4

An efficient tile-based ECO router with routing graph reduction and enhanced global routing flow

Jin-Yih Li, Yih-Lang Li
2005 Proceedings of the 2005 international symposium on physical design - ISPD '05  
The ECO router with new design flow can perform up to 20 times faster than the original tile-based router, at the cost of only a very small decline in routing quality.  ...  Tile-based routers have work with fewer nodes of the routing graph than grid and connection-based routers; however, the number of nodes of the tile-based routing graph has grown to over a thousand millions  ...  The connectivity states of the internal edges of GC(i,j) are to be determined and the routing direction through GC(i,j) is from left to right; only the idle-path heaps of GC(i,j+1) and GC (i,j-1) need  ... 
doi:10.1145/1055137.1055142 dblp:conf/ispd/LiL05 fatcat:3txa464aijhhzc3lgmr6smvpiq

Tiling on systems with communication/computation overlap

Pierre-Yves Calland, Jack Dongarra, Yves Robert
1999 Concurrency Practice and Experience  
In the framework of fully permutable loops, tiling is a compiler technique (also known as 'loop blocking') that has been extensively studied as a source-to-source program transformation.  ...  Little work has been devoted to the mapping and scheduling of the tiles on to physical parallel processors.  ...  ACKNOWLEDGEMENTS We are deeply indebted to Sanjay Rajopadhye for his useful comments on a first version of this paper. We also thank the referees for suggesting several improvements.  ... 
doi:10.1002/(sici)1096-9128(199903)11:3<139::aid-cpe370>3.0.co;2-x fatcat:epwtosvv5bh4jdkys2vdwnk3di

A configurable architecture to limit wakeup current in dynamically-controlled power-gated FPGAs

Assem A.M. Bsoul, Steven J.E. Wilton
2012 Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays - FPGA '12  
85% in a region of size 4x4 tiles.  ...  Our results show that the area overhead of the proposed inrush current limiting architecture is less than 2% for a power gating region of size 3x3 or 4x4 tiles, and the leakage power saved is more than  ...  The value of I max tile is fixed, and is determined at fabrication time. As a result, the delay elements and the sleep transistors (STs) do not need to be configurable.  ... 
doi:10.1145/2145694.2145737 dblp:conf/fpga/BsoulW12 fatcat:gr56blad7jaktbcgfzndxndf4m

An evaluation of code and data optimizations in the context of disk power reduction

Mahmut Kandemir, Seung Woo Son, Guangyu Chen
2005 Proceedings of the 2005 international symposium on Low power electronics and design - ISLPED '05  
The reason is that these optimizations do not take into account how disk-resident array data are laid out on the disk system, and consequently, fail to increase idle periods of disks, which is the primary  ...  The experiments also show that the benefits coming from our approach increase with the increased number of disks; i.e., it scales very well.  ...  For instance, after the transformation, at a given time, a column-block of array X2 is active.  ... 
doi:10.1145/1077603.1077655 dblp:conf/islped/KandemirSC05 fatcat:uj4keao76jdzpkvqe2jwixu5le

An evaluation of code and data optimizations in the context of disk power reduction

M. Kandemir, Seung Woo Son, Guangyu Chen
2005 ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.  
The reason is that these optimizations do not take into account how disk-resident array data are laid out on the disk system, and consequently, fail to increase idle periods of disks, which is the primary  ...  The experiments also show that the benefits coming from our approach increase with the increased number of disks; i.e., it scales very well.  ...  For instance, after the transformation, at a given time, a column-block of array X2 is active.  ... 
doi:10.1109/lpe.2005.195516 fatcat:ieh2h73okneydircafngfrqxsa

Tiling and Scheduling of Three-level Perfectly Nested Loops with Dependencies on Heterogeneous Systems

Ebrahim Zarei Zefreh, Shahriar Lotfi, Leyli Mohammad Khanli, Jaber Karimpour
2016 Scalable Computing : Practice and Experience  
The 3D tiling reduces the parallel execution time by a factor of 1.2× to 2× over the 2D tiling, while parallelizing 3D heat equation as a benchmark. 1. Introduction.  ...  Also, we propose a tiling genetic algorithm that used the proposed model to find the nearoptimal tile size, minimizing the parallel execution time of dependence nested loops.  ...  The authors would like to thanks the editor and the reviewers for their helpful and constructive suggestions, which considerably improved the quality of the paper.  ... 
doi:10.12694/scpe.v17i4.1205 fatcat:tvbrsukttvct5fky2g7tsz3j5e

Time-minimal tiling when rise is larger than zero

Jingling Xue, Wentong Cai
2002 Parallel Computing  
This paper presents a solution to the open problem of finding the optimal tile size to minimise the execution time of a parallelogram-shaped iteration space on a distributed memory machine when the rise  ...  of the tiled iteration space is larger than zero.  ...  Acknowledgements The authors are grateful to the referees for their constructive and helpful comments, which have greatly improved the presentation of the paper.  ... 
doi:10.1016/s0167-8191(02)00098-4 fatcat:my6lc3m4uvblbovwsiheyv37xq

Oversubscribed Command Queues in GPUs

Sooraj Puthoor, Xulong Tang, Joseph Gross, Bradford M. Beckmann
2018 Proceedings of the 11th Workshop on General Purpose GPUs - GPGPU-11  
Although increasing the number of command queues is good for kernel concurrency, the GPU hardware can only monitor a fixed number of queues at any given time.  ...  Therefore, if the number of command queues exceeds hardware's monitoring capability, the queues become oversubscribed and hardware has to service some of these queues sequentially.  ...  ACKNOWLEDGMENT AMD, the AMD Arrow logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc.  ... 
doi:10.1145/3180270.3180271 dblp:conf/ppopp/PuthoorTGB18 fatcat:vjv4pqtwt5ahdda2cqbktxee7a

Improving the efficiency of run time reconfigurable devices by configuration locking

Yang Qu, Juha-Pekka Soininen, Jari Nurmi
2008 Proceedings of the conference on Design, automation and test in Europe - DATE '08  
The idea is to at run-time lock a number of the most frequently used tasks on the configuration memory so that they cannot be evicted by other tasks.  ...  Run-time reconfigurable logic is a very attractive alterative in the design of SoC. However, configuration overhead can largely decrease the system performance.  ...  When a task is ready to run, the run-time scheduler checks if any locked tile or idle tile currently holds the configuration data of this task.  ... 
doi:10.1145/1403375.1403439 fatcat:rggoda2ltjdz5lcr3w36ligczi

suCAQR: A Simplified Communication-Avoiding QR Factorization Solver Using the TBLAS Framework

Weijian Zheng, Fengguang Song, Lan Lin, Zizhong Chen
2016 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)  
determine an optimal number of factorization domains.  ...  The software design includes a mixed usage of physical and logical data layouts, a simplified method of dynamic-root binary-tree reduction, a dynamic dataflow implementation, and an analytical model to  ...  This is because they are designed to send and receive small messages of tiles to minimize a process's CPU idle time by increasing the degree of parallelism.  ... 
doi:10.1109/icpads.2016.0144 dblp:conf/icpads/ZhengSLC16 fatcat:vsju6w3vgzf75maz5uzuinh57m

Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices

Yang Qu, Juha-Pekka Soininen, Jari Nurmi
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
The basic idea is to use configuration prefetching and parallelism to create excessive system idle time and apply DVS on the configuration process when such idle time can be utilized.  ...  A genetic algorithm is developed to solve the task scheduling and voltage assignment problem. With real applications, the results show that up to 19.3% of configuration energy can be reduced.  ...  We would also like to thank Juanjo Noguera for providing the implementation results of the image sharpening applications. References  ... 
doi:10.1109/date.2007.364582 dblp:conf/date/QuSN07 fatcat:6geyux5acnd4nmhdvwe73v6sw4

Performance Analysis and Optimization of a Hybrid Seismic Imaging Application

Sri Raj Paul, Mauricio Araya-Polo, John Mellor-Crummey, Detlef Hohl
2016 Procedia Computer Science  
Here we describe our experiences of using performance analysis tools to gain insight into an MPI+OpenMP code developed by Shell that performs Reverse Time Migration on a cluster to produce models of the  ...  These tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores.  ...  In our investigation of the imbalance, we determined that the decomposition of work into tiles is as illustrated in Figure 3 :A.  ... 
doi:10.1016/j.procs.2016.05.293 fatcat:acdibgur7fgidg7gox5ylrxwm4

Integration of process planning and production management [chapter]

Gideon Halevi, H. J. J. Kals
1997 Computer Applications in Production and Engineering  
The method does a Material Requirements Planning (finite capacity) and Capacity Planning in one run, keeping the product network (Bill of Material) at all times.  ...  , However, the selection of the routing is deferred to the moment of need.  ...  When the idle time is found, the name of the item is inserted in the row.  ... 
doi:10.1007/978-0-387-35291-6_41 fatcat:ys5db2norfap3i65egtpwpiiri
« Previous Showing results 1 — 15 out of 13,361 results