Filters








20 Hits in 2.7 sec

Weld: A Multithreading Technique Towards Latency-tolerant VLIW Processors [chapter]

Emre Özer, Thomas M. Conte, Saurabh Sharma
2001 Lecture Notes in Computer Science  
This paper presents a new architecture model, named Weld, for VLIW processors.  ...  Weld integrates multithreading support into a VLIW processor to hide run-time latency effects that cannot be determined by the compiler.  ...  The machine model used for the experiments is a 6-wide VLIW processor with 2 universal and 4 ALU/BR units, 128 integer and 128 floating-point registers.  ... 
doi:10.1007/3-540-45307-5_17 fatcat:xym3aae3lrfh5kl7jgpxdpnvrm

Energy-aware opcode design

Balaji V. Iyer, Jason A. Poovey, Thomas M. Conte
2008 2008 IEEE International Conference on Computer Design  
It is also shown that this heuristic can be used to achieve similar results on different issue-width processors.  ...  Embedded processors are required to achieve high performance while running on batteries.  ...  Instruction scheduling is done on the RTL instructions. For this work, an aggressive scheduler using Treegion scheduling [6] is used to maximize the compiler's scheduling ability.  ... 
doi:10.1109/iccd.2008.4751918 dblp:conf/iccd/IyerPC08 fatcat:qhdqdvtn2rhxjk4s3c5c5rp3ke

CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Vasileios Porpodas, Marcelo Cintra
2013 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)  
Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering.  ...  In this work we propose CAeSaR, a novel instruction scheduling algorithm that improves code generation for such architectures.  ...  VLIW processors are good candidates for clustering, as they are wide-issue by design.  ... 
doi:10.1109/cases.2013.6662513 dblp:conf/cases/PorpodasC13 fatcat:yjeplrjeczhcbdwzrjjt3jjx6m

Exploring the diversity of multimedia systems

Johnson Kin, Chunho Lee, W.H. Mangione-Smith, M. Potkonjak
2001 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
This combination enables us to build a unique framework for system-level synthesis and to gain valuable insights about design and use of application-specific programmable processors for modern applications  ...  constraint, and the number of branch units and issue width.  ...  Key ILP compiler technologies, such as trace scheduling [4] , superblock scheduling [5] , treegion-scheduling [6] , hyperblock scheduling [8] , and software pipelining [9] are in the process of migrating  ... 
doi:10.1109/92.929581 fatcat:iibo77aeazat3kuc7t32ghp5cm

Reaching fast code faster

Won So, Alexander G. Dean
2006 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems - CASES '06  
We use resource modeling, consider register pressure and compensate for compiler optimizations. This enables different scenarios to be compared and ranked.  ...  When integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be aligned and integrated  ...  INTRODUCTION Many modern microprocessors and digital signal processors (D-SPs) are capable of issuing multiple instructions per cycle.  ... 
doi:10.1145/1176760.1176764 dblp:conf/cases/SoD06 fatcat:j7xkib5u25ehvjdwx2jwme5uty

Merging Head and Tail Duplication for Convergent Hyperblock Formation

Bertrand A. Maher, Aaron Smith, Doug Burger, Kathryn S. McKinley
2006 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)  
This algorithm offers a solution to hyperblock phase ordering problems and can be configured to implement a wide range of policies.  ...  Simulation results for an EDGE architecture show that convergent hyperblock formation improves code quality over discrete-phase approaches with heuristics for VLIW and EDGE.  ...  The TRIPS microarchitecture is a 16-wide processor with 128 architectural registers.  ... 
doi:10.1109/micro.2006.34 dblp:conf/micro/MaherSBM06 fatcat:sqwzvkxnf5fango7zbiotad2ra

Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

Anup Gangwar, M. Balakrishnan, Anshul Kumar
2007 ACM Transactions on Design Automation of Electronic Systems  
In this paper we first build a case for clustered VLIW processors with more than four clusters by showing that the available ILP in most of the media applications for a 16 ALU and 8 LD/ST VLIW processor  ...  These architectures are termed as clustered VLIW processors.  ...  The authors would also like to thank Rohit Khandekar, Department of Computer Science and Engineering, IIT Delhi for the many useful discussions and suggestions.  ... 
doi:10.1145/1217088.1217089 fatcat:uedpg3yjrjathbx7neuxqywtum

Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

Anup Gangwar, M. Balakrishnan, Anshul Kumar
2007 ACM Transactions on Design Automation of Electronic Systems  
In this paper we first build a case for clustered VLIW processors with more than four clusters by showing that the available ILP in most of the media applications for a 16 ALU and 8 LD/ST VLIW processor  ...  These architectures are termed as clustered VLIW processors.  ...  The authors would also like to thank Rohit Khandekar, Department of Computer Science and Engineering, IIT Delhi for the many useful discussions and suggestions.  ... 
doi:10.1145/1188275.1188276 fatcat:43t3ubhg6jbozjupjqa4dnl3ne

An Overview of the Open Research Compiler [chapter]

Chengyong Wu, Ruiqi Lian, Junchao Zhang, Roy Ju, Sun Chan, Lixia Liu, Xiaobing Feng, Zhaoqing Zhang
2005 Lecture Notes in Computer Science  
Since its first release in 2002, it has been widely used in academia and industry worldwide as a compiler and architecture research infrastructure and as code base for further development.  ...  ORC), jointly developed by Intel Microprocessor Technology Labs and the Institute of Computing Technology at Chinese Academy of Sciences, has become the leading open source compiler on the Itanium TM Processor  ...  We also would like to thank the colleagues in the Programming Systems Lab at Intel Microprocessor Technology Labs for their valuable inputs to this project.  ... 
doi:10.1007/11532378_3 fatcat:3wzr2ibhvjb4vgt6fp5pk3e5aa

Code size efficiency in global scheduling for ILP processors

Huiyang Zhou, T.M. Conte
Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures  
In global scheduling for ILP processors, regionenlarging optimizations, especially tail duplication, are commonly used.  ...  size efficiency for any program.  ...  Treegion-based global scheduling aims for high performance for wide issue VLIW / EPIC processors although it can be applied to superscalar processors as well.  ... 
doi:10.1109/intera.2002.995845 dblp:conf/IEEEinteract/ZhouC02 fatcat:igrr7i6zhbd25dvwhh2tlu5qaa

Manycore simulation for peta-scale system design: Motivation, tools, challenges and prospects

Javad Zarrin, Rui L. Aguiar, João Paulo Barraca
2017 Simulation modelling practice and theory  
However, current simulation tools are very slow, often specific-purposeoriented, suffer from various issues and are rarely able to simulate thousands of cores.  ...  Abstract The architecture design of peta-scale computing systems is complex and presents lots of difficulties to designs, as current tools lack support for relevant features of future scenarios.  ...  These issues include segmentation of simulation workloads, dynamic scheduling, communications between simulator instances, time management and synchronization.  ... 
doi:10.1016/j.simpat.2016.12.014 fatcat:j2acoyv235awfjkz6w7krvzh44

Compiling for fine-grain concurrency: planning and performing software thread integration

A.G. Dean
23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002.  
We also wish to acknowledge the independent reviewers for their feedback on an earlier version of this paper, as well as Tom Conte for his insight into an early version of this research.  ...  References Acknowledgments We would like to thank Hewlett-Packard Laboratories Cambridge for the use of Dynamo.  ...  Treegion-based global scheduling aims for high performance for wide issue VLIW / EPIC processors although it can be applied to superscalar processors as well.  ... 
doi:10.1109/real.2002.1181566 dblp:conf/rtss/Dean02 fatcat:fqa2phhkincf7giit3ed7kzlbe

Compiler-based frame formation for static optimization

Feng Shi, S. Almukhaizim, Pey-Chang Lin, Y. Makris
IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings.  
Acknowledgements The authors would like to thank Daniel Friendly (Yale University) for his contributions to this work.  ...  Previous approaches include superblock formation [13] , predicated execution using hyperblocks [17] , VLIW treegion scheduling [12] , block-structured ISA [11] and frame scheduling [6] .  ...  Moreover, the scalability in performance of block-structured ISAs and treegions is highly questionable as they must generate treegions and enlarged atomic blocks that match the processor width.  ... 
doi:10.1109/iccd.2004.1347963 dblp:conf/iccd/ShiALM04 fatcat:6i7bhesrefcgflbkcmoi32zj6i

Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

E. Ozer, S. Banerjia, T.M. Conte
Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture  
Instruction scheduling for a clustered machine requires assignment and scheduling of operations to the clusters.  ...  In this paper, a new scheduling algorithm named unified-assign-and-schedule (UAS) is proposed for clustered, statically-scheduled architectures.  ...  Introduction Many high performance processors are designed with wide issue widths to exploit high levels of instruction level parallelism (ILP).  ... 
doi:10.1109/micro.1998.742792 dblp:conf/micro/OzerBC98 fatcat:s7a2sm4bw5byrbkqvvml7zzuv4

An eight-issue tree-VLIW processor for dynamic binary translation

K. Ebcioglu, J. Fritts, S. Kosonocky, M. Gschwind, E. Altman, K. Kailas, T. Bright
Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273)  
Presented is an 8-issue tree-VLIW processor designed for efficient support of dynamic binary translation.  ...  Performance simulations show that the simplicity of a VLIW architecture allows a wide-issue processor to operate at high frequencies.  ...  Most operations can be executed in any issue slot, except for memory and extender operations, which may only be scheduled in four issue slots.  ... 
doi:10.1109/iccd.1998.727094 dblp:conf/iccd/EbciogluFKGAKB98 fatcat:pqscocshgjhslns26ceewwwztq
« Previous Showing results 1 — 15 out of 20 results