Filters








13,757 Hits in 5.0 sec

Dynamic binary translation and optimization

K. Ebcioglu, E. Altman, M. Gschwind, S. Sathaye
2001 IEEE transactions on computers  
This design approach o ers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and makes use of dynamic adaptation.  ...  Early VLIW architectures were targeted at highly regular code, typically scienti c numeric code which s p e n t the major execution time in a few loops which w ere highly parallelizable 3 4 5 .  ...  Adaptive Scheduling Principles To obtain the best possible performance, group formation and scheduling are adaptive and a function of execution frequency and execution behavior.  ... 
doi:10.1109/12.931892 fatcat:3fxsl64mzrc6lgwpopagc2lmpu

Dynamic parallelization of recursive code

Charlotte Herzeel, Pascal Costanza
2010 Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '10  
In this paper, we show that recursive programs can be effectively parallelized when arguments to procedures are evaluated concurrently and branches of conditional statements are speculatively executed  ...  While most approaches to automatic parallelization focus on compilation approaches for parallelizing loop iterations, we advocate the need for new virtual machines that can parallelize the execution of  ...  Gabriel for his support and advice to improve this paper.  ... 
doi:10.1145/1869459.1869491 dblp:conf/oopsla/HerzeelC10 fatcat:idxipwjpcfdddfh6dvn3bilcxe

Speculative execution on multi-GPU systems

Gregory Diamos, Sudhakar Yalamanchili
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
In this paper, we extend Harmony to target systems with multiple accelerators using control speculation to expose parallelism. We refer to this technique as Kernel Level Speculation (KLS).  ...  In previous work, we presented the Harmony execution model for computing on heterogeneous systems with several CPUs and accelerators.  ...  ACKNOWLEDGEMENT The authors gratefully acknowledge the generous support of this work by LogicBlox Inc. and NVIDIA Corp. both through research grants, fellowships, and technical interactions, and equipment  ... 
doi:10.1109/ipdps.2010.5470427 dblp:conf/ipps/DiamosY10 fatcat:ytpb4o2jkvcedc3cwtkt2huqdi

Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution

Mennatallah El-Assady, Fabian Sperrle, Oliver Deussen, Daniel Keim, Christopher Collins
2018 IEEE Transactions on Visualization and Computer Graphics  
This paper presents an explainable, mixed-initiative topic modeling framework that integrates speculative execution into the algorithmic decisionmaking process.  ...  Abstract-To effectively assess the potential consequences of human interventions in model-driven analytics systems, we establish the concept of speculative execution as a visual analytics paradigm for  ...  Interaction Design All visualization components are highly interactive to support the users' exploration and analysis.  ... 
doi:10.1109/tvcg.2018.2864769 pmid:30235133 fatcat:ew2krtab6jgrxbkz76oqmqmxtq

A survey of new research directions in microprocessors

J. Šilc, T. Ungerer, B. Robic
2000 Microprocessors and microsystems  
Multiscalar and trace processors define several processing cores that speculatively execute different parts of a sequential program in parallel.  ...  Since a dynamic trace of instructions may contain multiple taken branches, there is no need to fetch from multiple targets, as would be necessary when predicting multiple branches and fetching 16 or 32  ...  Speculation on the instruction flow uses a two-level adaptive branch predictor with local and global branch history, combined with a trace cache to execute more than one taken branch per cycle, which is  ... 
doi:10.1016/s0141-9331(00)00072-7 fatcat:55y6n4wzijaeppl3l5qp6x2koa

RPU

Sven Woop, Jörg Schmittler, Philipp Slusallek
2005 ACM SIGGRAPH 2005 Papers on - SIGGRAPH '05  
Although, running at only 66 MHz the prototype FPGA implementation already renders images at up to 20 frames per second, which in many cases beats the performance of highly optimized software running on  ...  The entire scene representation including k-D trees, shader code, and any shader parameters is downloaded from the host via DMA.  ...  Conditional returns are supported to further reduce the number of branches.  ... 
doi:10.1145/1186822.1073211 fatcat:mymf4espwvgqblqxw5ukg6vtye

RPU

Sven Woop, Jörg Schmittler, Philipp Slusallek
2005 ACM Transactions on Graphics  
Although, running at only 66 MHz the prototype FPGA implementation already renders images at up to 20 frames per second, which in many cases beats the performance of highly optimized software running on  ...  The entire scene representation including k-D trees, shader code, and any shader parameters is downloaded from the host via DMA.  ...  Conditional returns are supported to further reduce the number of branches.  ... 
doi:10.1145/1073204.1073211 fatcat:az6x6usc7nfuxcfn25cxp2jroi

Post-pass binary adaptation for software-based speculative precomputation

Steve S.W. Liao, Perry H. Wang, Hong Wang, Gerolf Hoflehner, Daniel Lavery, John P. Shen
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02  
SSP does not require expensive hardware support-instead it relies on the compiler to adapt binaries to perform prefetching on otherwise idle hardware thread contexts at run time.  ...  The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread.  ...  Hand Adaptation Wang et al. performed hand adaptation on three memoryintensive benchmarks for speculative precomputation [31] .  ... 
doi:10.1145/512529.512544 dblp:conf/pldi/LiaoWWSHL02 fatcat:hidw6rtajfd5vffdg2havbqi4e

Post-pass binary adaptation for software-based speculative precomputation

Steve S.W. Liao, Perry H. Wang, Hong Wang, Gerolf Hoflehner, Daniel Lavery, John P. Shen
2002 SIGPLAN notices  
SSP does not require expensive hardware support-instead it relies on the compiler to adapt binaries to perform prefetching on otherwise idle hardware thread contexts at run time.  ...  The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread.  ...  Hand Adaptation Wang et al. performed hand adaptation on three memoryintensive benchmarks for speculative precomputation [31] .  ... 
doi:10.1145/543552.512544 fatcat:5mi2fvfb3bcinfnxcbu3gglhue

Post-pass binary adaptation for software-based speculative precomputation

Steve S.W. Liao, Perry H. Wang, Hong Wang, Gerolf Hoflehner, Daniel Lavery, John P. Shen
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02  
SSP does not require expensive hardware support-instead it relies on the compiler to adapt binaries to perform prefetching on otherwise idle hardware thread contexts at run time.  ...  The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread.  ...  Hand Adaptation Wang et al. performed hand adaptation on three memoryintensive benchmarks for speculative precomputation [31] .  ... 
doi:10.1145/512541.512544 fatcat:nshqi5hd4rh2jbrrzpodutdw5y

Difficult-path branch prediction using subordinate microthreads

Robert S. Chappell, Francis Tseng, Adi Yoaz, Yale N. Patt
2002 SIGARCH Computer Architecture News  
We propose to dynamically construct microthreads that can speculatively and accurately pre-compute branch outcomes along frequently mispredicted paths.  ...  Simultaneous Subordinate Microthreading (SSMT) provides a means to improve branch prediction accurac): SSMT machines run multiple, concurrent microthreads in support of the primary thread.  ...  The processor executed these speculative slices as helper threads to generate branch predictions and prefetches.  ... 
doi:10.1145/545214.545250 fatcat:z7s5yiikszct7kxyxe2v4at2em

Virtual simple architecture (VISA)

Aravindh Anantaraman, Kiran Seth, Kaustubh Patil, Eric Rotenberg, Frank Mueller
2003 Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03  
Contemporary worst-case timing analysis tools can safely and tightly bound execution time on in-order single-issue pipelines with caches and static branch prediction.  ...  Worst-case execution times (WCET) of tasks are needed for safe planning.  ...  Acknowledgments This research was supported in part by NSF grants CCR-0207785 and CCR-0208581, NSF CAREER grant CCR-0092832, and generous funding and equipment donations from Intel.  ... 
doi:10.1145/859618.859659 fatcat:hbgvx4pnzvh3fcxl62k67uofqm

Virtual simple architecture (VISA)

Aravindh Anantaraman, Kiran Seth, Kaustubh Patil, Eric Rotenberg, Frank Mueller
2003 SIGARCH Computer Architecture News  
Contemporary worst-case timing analysis tools can safely and tightly bound execution time on in-order single-issue pipelines with caches and static branch prediction.  ...  Worst-case execution times (WCET) of tasks are needed for safe planning.  ...  Acknowledgments This research was supported in part by NSF grants CCR-0207785 and CCR-0208581, NSF CAREER grant CCR-0092832, and generous funding and equipment donations from Intel.  ... 
doi:10.1145/871656.859659 fatcat:3mv3d4ddnnguzn33wkonbku2na

Virtual simple architecture (VISA)

Aravindh Anantaraman, Kiran Seth, Kaustubh Patil, Eric Rotenberg, Frank Mueller
2003 Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03  
Contemporary worst-case timing analysis tools can safely and tightly bound execution time on in-order single-issue pipelines with caches and static branch prediction.  ...  Worst-case execution times (WCET) of tasks are needed for safe planning.  ...  Acknowledgments This research was supported in part by NSF grants CCR-0207785 and CCR-0208581, NSF CAREER grant CCR-0092832, and generous funding and equipment donations from Intel.  ... 
doi:10.1145/859658.859659 fatcat:uj4noou2hfhfzhkmkvz2lkk224

Advances and future challenges in binary translation and optimization

E.R. Altman, K. Ebcioglu, M. Gschwind, S. Sathaye
2001 Proceedings of the IEEE  
If a variety of customers are to be supported, a variety of architectures (x86, PowerPC, Sparc, etc.) must be present in the farm.  ...  The tree form of DAISY groups comes at the cost of some code explosion via tail duplication.  ...  For example, if PowerPC code has a diamond as in Fig. 16(a) , DAISY's generates a tree group as output via tail duplication of block as shown in Fig. 16(b) .  ... 
doi:10.1109/5.964447 fatcat:y7ghv7jhufbynpy3yhhfjghdti
« Previous Showing results 1 — 15 out of 13,757 results