Filters








1,060 Hits in 6.6 sec

Instruction Scheduling with Release Times and Deadlines on ILP Processors

Hui Wu, J. Jaffar, Jingling Xue
2006 12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'06)  
An optimising compiler for ILP processors needs to find a feasible schedule for a set of time-constrained instructions.  ...  In this paper, we present a fast algorithm for scheduling instructions with precedence-latency constraints, individual integer release times and deadlines on an ILP processor with multiple functional units  ...  In this paper, we propose a fast algorithm for scheduling instructions with individual release times and deadlines on an ILP processor with multiple functional units of different types.  ... 
doi:10.1109/rtcsa.2006.39 dblp:conf/rtcsa/WuJX06 fatcat:aze7ninpefbu5h5e2pgwxvqmxm

Instruction-level parallel processing: History, overview, and perspective

B. Ramakrishna Rau, Joseph A. Fisher
1993 Journal of Supercomputing  
instruction-level parallelism, VLIW processors, superscalar processors, pipelining, multiple operation issue, speculative execution, scheduling, register allocation Instruction-level Parallelism CILP)  ...  By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences.  ...  algorithms for ILP machines with that much parallelism.  ... 
doi:10.1007/bf01205181 fatcat:v7uhz4km5ndxzhr7baybks2bn4

Instruction-Level Parallel Processing: History, Overview, and Perspective [chapter]

B. Ramakrishna Rau, Joseph A. Fisher
1993 Instruction-Level Parallelism  
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel.  ...  By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences.  ...  algorithms for ILP machines with that much parallelism.  ... 
doi:10.1007/978-1-4615-3200-2_3 fatcat:eg7nutqurffxfj2y62g5lfc57m

Design Space Exploration for a Custom VLIW Architecture

M. K.Jain, Veena Ramnani
2013 International Journal of Computer Applications  
The Design Space Exploration is basically exploring the various processor architectures in order to search for a processor architecture that satisfies different conflicting criteria such as chip area,  ...  The objective of this research is to develop a retargetable compiler that can generate efficient code in terms of code size, cycle count and retargetability efforts for a VLIW processor.  ...  Very Long Instruction Word (VLIW) is one of such approach to design processors with high levels of ILP by executing long instructions composed of multiple operations.  ... 
doi:10.5120/9951-4598 fatcat:n672wx3e55bi3k2xt4jmsrl5yy

Optimizing Instruction-set Extensible Processors under Data Bandwidth Constraints

Kubilay Atasu, Robert G. Dimond, Oskar Mencer, Wayne Luk, Can Ozturan, Gunhan Dundar
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
We present a methodology for generating optimized architectures for data bandwidth constrained extensible processors.  ...  For an embedded processor with only two register read ports and one register write port, we obtain up to 4.3× speed-up with extensions incurring only a 35% area overhead. 2.  ...  We provide an ILP model which replaces the input/output abstraction of the previous approaches [5, 6, 7, 10, 15] with the actual data bandwidth constraints and data transfer costs.  ... 
doi:10.1109/date.2007.364657 dblp:conf/date/AtasuDMLOD07 fatcat:imjtfr2chzbbllzerlkzjvrmku

The resource-constrained modulo scheduling problem: an experimental study

Maria Ayala, Abir Benabid, Christian Artigues, Claire Hanen
2012 Computational optimization and applications  
In this paper, we focus on the resource-constrained modulo scheduling problem, a general periodic scheduling problem, abstracted from the problem solved by compilers when optimizing inner loops at instruction  ...  level for VLIW parallel processors.  ...  Acknowledgement The authors wish to warmly thank Benoit Dupont de Dinechin for making the ST200 instances available and also for this helpful advices on modulo scheduling.  ... 
doi:10.1007/s10589-012-9499-2 fatcat:qdm65soz5zbbzbnd2pxmqhg5ly

Thread-Sensitive Modulo Scheduling for Multicore Processors

Lin Gao, Quan Hoang Nguyen, Lian Li, Jingling Xue, Tin-Fook Ngai
2008 2008 37th International Conference on Parallel Processing  
This paper describes a generalisation of modulo scheduling to parallelise loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving  ...  Our generalisation is simple, drops easily into traditional modulo scheduling algorithms such as Swing in GCC 4.1.1 and produces good speedups for SPECfp2000 benchmarks, particularly in terms of its ability  ...  TMS We describe a motivating example, a cost model for estimating the execution time of a modulo scheduled loop on SpMT processors, and finally our TMS algorithm.  ... 
doi:10.1109/icpp.2008.46 dblp:conf/icpp/GaoNLXN08 fatcat:migdv24xajb3zks2iphp7yp65u

CHIPS: Custom Hardware Instruction Processor Synthesis

Kubilay Atasu, Can Ozturan, GÜnhan Dundar, Oskar Mencer, Wayne Luk
2008 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
Our ILP-based approach scales well: benchmarks with basic blocks consisting of more than 1000 instructions can be optimally solved, most of the time within a few seconds.  ...  Index Terms-Application-specific instruction-set processors (ASIPs), custom instructions, customizable processors, extensible processors, integer linear programming (ILP), optimization algorithms.  ...  Clark for his help in using Trimaran.  ... 
doi:10.1109/tcad.2008.915536 fatcat:7asnj4qhmfhnlmm6h2qxle2tcm

Provably good task assignment on heterogeneous multiprocessor platforms for a restricted case but with a stronger adversary

Gurulingesh Raravi, Björn Andersson, Konstantinos Bletsas
2011 ACM SIGBED Review  
with no migrations if given processors twice as fast.  ...  with no migrations if given processors twice as fast.  ...  The Cell processor is a single chip comprising one main processor (Power4) and eight so-called synergistic processors (optimized for executing SIMD instructions) [2] .  ... 
doi:10.1145/2038617.2038621 fatcat:6s4py3b565eurfudcuox3egowa

FISH: Fast Instruction SyntHesis for Custom Processors

Kubilay Atasu, Wayne Luk, Oskar Mencer, Can Ozturan, Günhan Dundar
2012 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
FISH is based on novel methods for automatically adapting the instruction set to match an application in a high-level language such as C or C++.  ...  This paper presents Fast Instruction SyntHesis (FISH), a system that supports automatic generation of custom instruction processors from high-level application descriptions to enable fast design space  ...  Bolliger, A.-M. Cromack, and C. Hagleitner, IBM Research-Zurich, and the anonymous reviewers for their valuable comments.  ... 
doi:10.1109/tvlsi.2010.2090543 fatcat:75tk67zzkvgdjdcvk43stpzryi

An integer linear programming approach for identifying instruction-set extensions

Kubilay Atasu, Günhan Dündar, Can Özturan
2005 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '05  
An algorithm that iteratively generates and solves a set of ILP problems in order to generate a set of templates is proposed.  ...  A selection algorithm that ranks the generated templates based on isomorphism testing and potential evaluation is described.  ...  In [10] , a simulated annealing based algorithm is employed to generate clusters based on schedule time and resource usage of dataflow graph nodes. The work of Choi et al.  ... 
doi:10.1145/1084834.1084880 dblp:conf/codes/AtasuDO05 fatcat:5oqno6jpcbcj3oo77qke2vp5oa

Page 895 of IEEE Transactions on Computers Vol. 52, Issue 7 [page]

2003 IEEE Transactions on Computers  
Pnueli, “A Fast Algorithm for Scheduling Time-Constrained Instructions on Processor with ilp,’ Proc. Int'l Conf. Parallel Architectures and Compilation Techniques 1998 B.M. Smith and I.P.  ...  Chow, “A Near-Optimal Instruction Scheduler for a Tightly Constrained, Variable Instruction Set Embedded Proces- sor,” Proc. Int'l Conf.  ... 

A Register File Architecture and Compilation Scheme for Clustered ILP Processors [chapter]

Krishnan Kailas, Manoj Franklin, Kemal Ebcioğlu
2002 Lecture Notes in Computer Science  
We present an efficient code generation algorithm to schedule sendb operations on-the-fly.  ...  We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster  ...  Generic Clustered ILP Processor Model Fig. 3 . 3 A 2-cluster processor with CRB-based partitioned register file Algorithm 1 1 Algorithm for carscheduling sendb OPs Schedule-Sendb(Op, RegId, ClusterID  ... 
doi:10.1007/3-540-45706-2_68 fatcat:cw37lzv3svfotfwme3cpyawupy

Efficient and Scalable Compiler-Directed Energy Optimization for Realtime Applications

Po-Kuan Huang, Soheil Ghiasi
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
We present a compilation technique that targets realtime applications running on embedded processors with combined dynamic voltage scaling (DVS) and adaptive body biasing (ABB) capabilities.  ...  More importantly, our algorithm runs very fast and comes reasonably close to the theoretical limit of energy optimization using DVS+ABB.  ...  Hsu and Kremer [2003] introduce an algorithm that identifies the program regions with time slack for the processor, and implement it as a source-to-source transformation.  ... 
doi:10.1109/date.2007.364386 dblp:conf/date/HuangG07 fatcat:beiosanpbves5e7lnt4vwqid3u

Efficient and scalable compiler-directed energy optimization for realtime applications

Po-Kuan Huang, Soheil Ghiasi
2007 ACM Transactions on Design Automation of Electronic Systems  
We present a compilation technique that targets realtime applications running on embedded processors with combined dynamic voltage scaling (DVS) and adaptive body biasing (ABB) capabilities.  ...  More importantly, our algorithm runs very fast and comes reasonably close to the theoretical limit of energy optimization using DVS+ABB.  ...  Hsu and Kremer [2003] introduce an algorithm that identifies the program regions with time slack for the processor, and implement it as a source-to-source transformation.  ... 
doi:10.1145/1255456.1255464 fatcat:mgc4yblqu5hunkzwafudfwr3ne
« Previous Showing results 1 — 15 out of 1,060 results