Filters








19 Hits in 4.7 sec

iCFP: Tolerating all-level cache misses in in-order processors

Andrew Hilton, Santosh Nagarakatte, Amir Roth
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
In this paper, we go a step further and introduce iCFP (in-order Continual Flow Pipeline), an adaptation of the CFP concept to an in-order processor.  ...  Cycle-level simulations show that iCFP out-performs Runahead, Multipass, and SLTP, another non-blocking in-order pipeline design.  ...  Tolerating all-level cache misses.  ... 
doi:10.1109/hpca.2009.4798281 dblp:conf/hpca/HiltonNR09 fatcat:wh3xh44vgvhl3o624e5rxbqqzq

iCFP: Tolerating All-Level Cache Misses in In-Order Processors

Andrew Hilton, Santosh Nagarakatte, Amir Roth
2010 IEEE Micro  
In this paper, we go a step further and introduce iCFP (in-order Continual Flow Pipeline), an adaptation of the CFP concept to an in-order processor.  ...  Cycle-level simulations show that iCFP out-performs Runahead, Multipass, and SLTP, another non-blocking in-order pipeline design.  ...  Tolerating all-level cache misses.  ... 
doi:10.1109/mm.2010.20 fatcat:nlv5v7gapnbwdnivb4qjorjwjy

Static typing for a faulty lambda calculus

David Walker, Lester Mackey, Jay Ligatti, George A. Reis, David I. August
2006 Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming - ICFP '06  
In order to detect and recover from these faults, λzap programs replicate intermediate computations and use majority voting, thereby modeling software-based fault tolerance techniques studied extensively  ...  This type system guarantees that well-typed programs can tolerate any single data fault.  ...  Hence, our software replication is intended to protect data in the processor pipeline, rather than data in cache and main memory.  ... 
doi:10.1145/1159803.1159809 dblp:conf/icfp/WalkerMLRA06 fatcat:7gpg2gm6lfcgbg7f5r7oykp7gu

Atomic heap transactions and fine-grain interrupts

Olin Shivers, James W. Clark, Roland McGrath
1999 Proceedings of the fourth ACM SIGPLAN international conference on Functional programming - ICFP '99  
We have implemented this technique in a version of SML/NJ, and, because of its applicability to thread-based systems, are currently implementing it in the scheduler of our raw-hardware SMLbased kernel,  ...  Because the heap is in an inconsistent state during these operations, they must be performed atomically.  ...  It also requires going to memory-and almost certainly missing cache on a Harvard architecture, since the instructions are not likely to be in the D-cache.  ... 
doi:10.1145/317636.317783 dblp:conf/icfp/ShiversCM99 fatcat:ylqsx2jfubcufk4y3gdjlfvafa

OUTRIDER

Neal Clayton Crago, Sanjay Jeram Patel
2011 SIGARCH Computer Architecture News  
The key insight is that by decoupling the instruction streams, the processor pipeline can tolerate memory latency in a way similar to out-of-order designs while relying on a low-complexity in-order micro-architecture  ...  We present Outrider, an architecture for throughput-oriented processors that provides memory latency tolerance to improve performance on highly threaded workloads.  ...  In-order processors stall when a primary data cache miss occurs and a dependent operation is waiting to be issued.  ... 
doi:10.1145/2024723.2000079 fatcat:2ny5ydqgmffkvglkm2b2v6fxka

OUTRIDER

Neal Clayton Crago, Sanjay Jeram Patel
2011 Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11  
The key insight is that by decoupling the instruction streams, the processor pipeline can tolerate memory latency in a way similar to out-of-order designs while relying on a low-complexity in-order micro-architecture  ...  We present Outrider, an architecture for throughput-oriented processors that provides memory latency tolerance to improve performance on highly threaded workloads.  ...  In-order processors stall when a primary data cache miss occurs and a dependent operation is waiting to be issued.  ... 
doi:10.1145/2000064.2000079 dblp:conf/isca/CragoP11 fatcat:w56fto3w4vgoxamabvgcrkb2z4

Whip: higher-order contracts for modern services

Lucas Waye, Stephen Chong, Christos Dimoulas
2017 Proceedings of the ACM on Programming Languages  
Even though these applications do expose interfaces that are higher-order in spirit, the simplicity of the network protocols forces them to rely on brittle low-level encodings.  ...  Whip (i) provides programmers with a higher-order contract language tailored to the needs of modern services; and (ii) monitors services at run time to detect services that do not live up to their advertised  ...  When a cache miss occurs, the requested data is fetched to memory if found. In full generality, garbage collection of adapter state is as hard as determining distributed object lifetimes.  ... 
doi:10.1145/3110280 dblp:journals/pacmpl/WayeCD17 fatcat:4lg3iabfzjagxlc5g2qdsgxsky

CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution [article]

Milad Mohammadi, Tor M. Aamodt, William J. Dally
2016 arXiv   pre-print
We introduce the Coarse-Grain Out-of-Order (CG- OoO) general purpose processor designed to achieve close to In-Order processor energy while maintaining Out-of-Order (OoO) performance.  ...  Through the energy efficiency techniques applied to the compiler and processor pipeline stages, CG-OoO closes 64% of the average energy gap between the In-Order and Out-of-Order baseline processors at  ...  cache misses) [5] .  ... 
arXiv:1606.01607v1 fatcat:rzqeu325szbpzezg4oinpq7szu

BOLT: Energy-efficient Out-of-Order Latency-Tolerant execution

Andrew Hilton, Amir Roth
2010 HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture  
LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return.  ...  LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores.  ...  LT attacks one of the primary sources of low performance in conventional out-of-order cores, long-latency LLC (last-level cache) misses.  ... 
doi:10.1109/hpca.2010.5416634 dblp:conf/hpca/HiltonR10 fatcat:eegugksit5alzh2wx5tqg3dqcy

Decoupling loads for nano-instruction set computers

Ziqiang Huang, Andrew D. Hilton, Benjamin C. Lee
2016 SIGARCH Computer Architecture News  
We propose an ISA extension that decouples the data access and register write operations in a load instruction. We describe system and hardware support for decoupled loads.  ...  Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these sponsors.  ...  By aggressively scheduling instructions out of program order during execution, OoO processors exploit instruction level parallelism to a greater degree than in-order (IO) ones.  ... 
doi:10.1145/3007787.3001181 fatcat:do5aug5q6fgmbb2jiel5ppp5xu

Decoupling Loads for Nano-Instruction Set Computers

Ziqiang Huang, Andrew D. Hilton, Benjamin C. Lee
2016 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)  
We propose an ISA extension that decouples the data access and register write operations in a load instruction. We describe system and hardware support for decoupled loads.  ...  Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of these sponsors.  ...  By aggressively scheduling instructions out of program order during execution, OoO processors exploit instruction level parallelism to a greater degree than in-order (IO) ones.  ... 
doi:10.1109/isca.2016.43 dblp:conf/isca/HuangHL16 fatcat:jkpksvosfbct5o6e2jrlqyqyva

Efficiently scaling out-of-order cores for simultaneous multithreading

Faissal M. Sleiman, Thomas F. Wenisch
2016 SIGARCH Computer Architecture News  
However, because thread interleaving spreads dependent instructions, nearly half of instructions dynamically issue in program order after all false dependences have resolved.  ...  We develop a technique to efficiently scale in-flight instructions through a hybrid outof-order/in-order microarchitecture, which can dispatch instructions to efficient in-order scheduling mechanisms-using  ...  One such design is the in-order Continual Flow Pipeline (iCFP) [12] , which targets long-latency operations like cache misses that block in-order cores.  ... 
doi:10.1145/3007787.3001183 fatcat:fjk5dc6r4vhrtlmiong63okoay

Efficiently Scaling Out-of-Order Cores for Simultaneous Multithreading

Faissal M. Sleiman, Thomas F. Wenisch
2016 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)  
However, because thread interleaving spreads dependent instructions, nearly half of instructions dynamically issue in program order after all false dependences have resolved.  ...  We develop a technique to efficiently scale in-flight instructions through a hybrid outof-order/in-order microarchitecture, which can dispatch instructions to efficient in-order scheduling mechanisms-using  ...  One such design is the in-order Continual Flow Pipeline (iCFP) [12] , which targets long-latency operations like cache misses that block in-order cores.  ... 
doi:10.1109/isca.2016.45 dblp:conf/isca/SleimanW16 fatcat:pmsedbqswvepjkjzq7lia332dq

ML grid programming with ConCert

Tom Murphy
2006 Proceedings of the 2006 workshop on ML - ML '06  
In order to deal with the particular contours of fault-tolerant distributed programming, we design our network substrate with failure recovery and a simple, local scheduling policy in mind.  ...  In order to convince users to donate their unused cycles, they must do so at negligible risk. Donors should be protected from both malicious Grid programmers and imperfect ones. Failure.  ...  If B is communicating with several processes, we need to replay all the messages sent by all of those processes, in the correct order. This technique is called message logging [18] .  ... 
doi:10.1145/1159876.1159879 dblp:conf/ml/VII06 fatcat:3rbckqfn4va65b5xzxax5gm764

Atomic heap transactions and fine-grain interrupts

Olin Shivers, James W. Clark, Roland McGrath
1999 SIGPLAN notices  
We have implemented this technique in a version of SML/NJ, and, because of its applicability to thread-based systems, are currently implementing it in the scheduler of our raw-hardware SMLbased kernel,  ...  Because the heap is in an inconsistent state during these operations, they must be performed atomically.  ...  It also requires going to memory-and almost certainly missing cache on a Harvard architecture, since the instructions are not likely to be in the D-cache.  ... 
doi:10.1145/317765.317783 fatcat:bqhiope2wzdrnff4ysr2gbl7hi
« Previous Showing results 1 — 15 out of 19 results