Filters








33,698 Hits in 3.2 sec

DataScalar architectures

Doug Burger, Stefanos Kaxiras, James R. Goodman
1997 SIGARCH Computer Architecture News  
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In this execution model, each processor broadcasts operands it loads from its local memory to all other units. In this paper, we describe the benefits, costs, and problems associated with the DataScalar model. We also present simulation results
more » ... one possible implementation of a DataScalar system. In our simulated implementation, six unmodified SPEC95 binaries ran from 7% slower to 50% faster on two nodes, and from 9% to 100% faster on four nodes, than on a system with a comparable, more traditional memory system. Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail. We conclude with a discussion of how DataScalar systems may accommodate traditional parallel processing, thus improving performance over a much wider range of applications than is currently possible with either model. 1. Except when the data are cached, in which case the cache line is updated, and no write-through or write-back is required.
doi:10.1145/384286.264215 fatcat:5fuljppb7ncajjaclnc2an6rue

Efficient synchronization

Alain Kägi, Doug Burger, James R. Goodman
1997 SIGARCH Computer Architecture News  
QOLB Goodman, Vernon, and Wocst proposed the Queue-On-Lock-Bit primitive (Qora--originally called QOSB) [13] , which was the first proposal for a distributed, queue-based locking scheme.  ...  Our seven main locking schemes (and their corresponding abbreviations) are thus as follows: TEST&SET (TS), TEsrsrSrirCSm (ITS), MCS locks, LH locks, M locks, reactive synchronization (R), and QOLB.  ... 
doi:10.1145/384286.264166 fatcat:46wccg2dvrg55c3sem5iuewugu

Transactional lock-free execution of lock-based programs

Ravi Rajwar, James R. Goodman
2002 SIGPLAN notices  
Therefore P l ' s r d x for B has TS1 appended and P 2 ' s r d X f o r A h a s T S 2 a p p e n d e d .  ...  The arc labelling "1 : r d _ x : A" means a read for exclusive ownership ( r d X ) request for block A was issued at time t 1.  ... 
doi:10.1145/605432.605399 fatcat:k7j2qiu2m5butopikgfmozuq5i

DataScalar architectures

Doug Burger, Stefanos Kaxiras, James R. Goodman
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In this execution model, each processor broadcasts operands it loads from its local memory to all other units. In this paper, we describe the benefits, costs, and problems associated with the DataScalar model. We also present simulation results
more » ... one possible implementation of a DataScalar system. In our simulated implementation, six unmodified SPEC95 binaries ran from 7% slower to 50% faster on two nodes, and from 9% to 100% faster on four nodes, than on a system with a comparable, more traditional memory system. Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail. We conclude with a discussion of how DataScalar systems may accommodate traditional parallel processing, thus improving performance over a much wider range of applications than is currently possible with either model. 1. Except when the data are cached, in which case the cache line is updated, and no write-through or write-back is required.
doi:10.1145/264107.264215 dblp:conf/isca/BurgerKG97 fatcat:uqpa6bqnoneopjamcnstnmp3fi

Efficient synchronization

Alain Kägi, Doug Burger, James R. Goodman
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
QOLB Goodman, Vernon, and Wocst proposed the Queue-On-Lock-Bit primitive (Qora--originally called QOSB) [13] , which was the first proposal for a distributed, queue-based locking scheme.  ...  Our seven main locking schemes (and their corresponding abbreviations) are thus as follows: TEST&SET (TS), TEsrsrSrirCSm (ITS), MCS locks, LH locks, M locks, reactive synchronization (R), and QOLB.  ... 
doi:10.1145/264107.264166 dblp:conf/isca/KagiBG97 fatcat:pd5v3g4mwfgqbkposlmdkjreaa

Transactional lock-free execution of lock-based programs

Ravi Rajwar, James R. Goodman
2002 SIGARCH Computer Architecture News  
Therefore P l ' s r d x for B has TS1 appended and P 2 ' s r d X f o r A h a s T S 2 a p p e n d e d .  ...  The arc labelling "1 : r d _ x : A" means a read for exclusive ownership ( r d X ) request for block A was issued at time t 1.  ... 
doi:10.1145/635506.605399 fatcat:xuxd4t7vxzfvbf7pgt3phmghbi

Memory bandwidth limitations of future microprocessors

Doug Burger, James R. Goodman, Alain Kägi
1996 SIGARCH Computer Architecture News  
R i i 1 + D i i 1 - D i 1 - R i D i D i 1 - ⁄ = R i i 1 + R i i 1 + E pin B pin R i i 1 = k ∏ -------------- = B pin E pin R i 1.0 = R i R i Traffic inefficiency To evaluate what percentage of the possible  ...  Goodman recognized the importance of a simple memory hierarchy for reducing memory bandwidth, particularly in a multiprocessor environment [18] .  ... 
doi:10.1145/232974.232983 fatcat:nrjsgzqj4ncklptsbodde67mty

Using cache memory to reduce processor-memory traffic

James R. Goodman
1983 SIGARCH Computer Architecture News  
The importance of reducing processormemory bandwidth is recognized in two distinct situations: single board computer systems and microprocessors of the future. Cache memory is investigated as a way to reduce the memory-processor traffic. We show that traditional caches which depend heavily on spatial locality (look-ahead) for their performance are inappropriate in these environments because they generate large bursts of bus traffic. A cache exploiting primarily temporal locality (look-behind)
more » ... then proposed and demonstrated to be effective in an environment where process switches are infrequent. We argue that such an environment is possible if the traffic to backing store is small enough that many processors can share a common memory and if the cache data consistency problem is solved. We demonstrate that such a cache can indeed reduce traffic to memory greatly, and introduce e.r elegant solution to the cache coherency problem.
doi:10.1145/1067651.801647 fatcat:nbgoz7jphncj7ouiemgufa74iu

Performance of the SCI ring

Steven L. Scott, James R. Goodman, Mary K. Vernon
1992 SIGARCH Computer Architecture News  
Acknowledgements The authors gratefully acknowledge the assistance of Dave James, Ross Johnson and Alain Kagi in clarifying several important concepts in the SCI standard, Thanks also to Rich Maclin and  ...  node i (8) per injected packet at (9) Utilization of node i's output link by passing packets u J -r~ti,il~m + r~r,il~r + reChO,,Cp w,i - (lo) Mean length of a passing packet at node  ...  lpW,i -r~~' -- (11) Residual life of passing packet at node i [ 1 rhti,ilkti + r&, ilL + reh,i&hO 1 L@,i = -- 2U~*,i 2 (12) Calculations inside iteration: bh,i lti.n.i 'pkf,i  ... 
doi:10.1145/146628.140404 fatcat:shvizfyutbcfnf7mtyuitpjnui

Inferential Queueing and Speculative Push

Ravi Rajwar, Alain Kägi, James R Goodman
2004 International journal of parallel programming  
Kaxiras and Goodman (37) proposed speculative pre-send as an approach for data forwarding. Ranganathan et al.  ...  (29) Goodman, et al. (6) made collocation more attractive by establishing the ability to defer access to the lock by an acquiring processor until the lock had been released.  ... 
doi:10.1023/b:ijpp.0000029274.45582.a8 fatcat:vktr4w3qc5fixmop2qws5hrtpu

Transactional lock-free execution of lock-based programs

Ravi Rajwar, James R. Goodman
2002 ACM SIGOPS Operating Systems Review  
Therefore P l ' s r d x for B has TS1 appended and P 2 ' s r d X f o r A h a s T S 2 a p p e n d e d .  ...  The arc labelling "1 : r d _ x : A" means a read for exclusive ownership ( r d X ) request for block A was issued at time t 1.  ... 
doi:10.1145/635508.605399 fatcat:tqajaw76mndoviogkdlctdeqpi

DataScalar: A memory-centric approach to computing

Stefanos Kaxiras, Doug Burger, James R. Goodman
1999 Journal of systems architecture  
Commodity microprocessors contain more on-chip memory with each successive generation, and will contain tens of megabytes within the decade. We describe a novel architecture that runs an unmodified uniprocessor program across multiple nodes, each of which contains a processor tightly integrated with a sizable memory. The execution of instructions is replicated, while the access of operands is distributed across the nodes. Each node accesses operands in its fast local memory and broadcasts them
more » ... o the other nodes. This architecture exploits out-oforder execution and the fact that each chip has integrated processor and memory, to run memory-intensive, hard-toparallelize programs more efficiently. In this paper, we describe an implementation with specific solutions to the unique problems that this architecture poses. Finally, we conclude by comparing simulation results of our implementation to more traditional equivalent systems. In our simulated implementation, five unmodified SPEC95 binaries ran -in most cases-considerably faster than in systems with more traditional memory systems. Journal of Systems Architecture 45 (1999) 1001-1022, Elsevier Science B.V. • The program's data set is too large for the on-chip main memory, with the result that the processor experiences thrashing. Programs in the second category may benefit from conventional multiprocessing, using multiple nodes to achieve either shared-memory or message-based parallel computing. If the program can be easily parallelized the collection of IRAM nodes can be used a conventional DSM multiprocessor [20] . That leaves the problem of programs that are too large to fit within a single node, but are not-easily parallelized. For such applications, we describe a novel approach, first proposed in [3] , that can retain the uniprocessor programming model that we believe can achieve higher performance than a conventional uniprocessor. A program's data set is spread across these nodes. All processors run the same program, broadcasting operands they own to the other processors when needed, and performing any tasks that can be accomplished entirely on-chip without off-chip communication. The rest of this paper unfolds as follows: In Section 2, we describe the DataScalar proposal, enumerating three major advantages it has over traditional architectures, and describing how each advantage improves performance. We also discuss implementation issues associated with these types of systems. In Section 3, we present a performance evaluation of DataScalar. In Section 4 we discuss result communication, an extension to the basic architecture to enhance its performance. Finally, in Section 5, we list other research efforts related to processor/memory integration, present future directions, and conclude.
doi:10.1016/s1383-7621(98)00048-4 fatcat:zjhccit4w5g4pdkibko5tsiqbi

Transactional conflict decoupling and value prediction

Fuad Tabba, Andrew W. Hay, James R. Goodman
2011 Proceedings of the international conference on Supercomputing - ICS '11  
Table 1 : 1 Simulated machine configuration Benchmark Tx Length R/W Set Contention genome medium medium low intruder short medium high kmeans short small low labyrinth long large  ... 
doi:10.1145/1995896.1995904 dblp:conf/ics/TabbaHG11 fatcat:ni4zxttubbaszp4m5kxevmguhy

Basal forebrain volume selectively and reliably predicts the cortical spread of Alzheimer's degeneration [article]

Sara Fernandez-Cabello, Martin Kronbichler, Koene R. A. Van Dijk, James A. Goodman, R. Nathan Spreng, Taylor Schmitz
2019 bioRxiv   pre-print
This evidence generalized across the independent samples (N1: r=0.20, p=0.03; N2: r=0.37, p<0.001).  ...  r CI: -0.1424 -0.13).  ...  and ADNI-GO/2 (Fig. 3b ; r=0.37, t308=5.15, p<0.001).  ... 
doi:10.1101/676544 fatcat:svssj5rzqvai3ldz6mkb5ye7ny

Viscous Effects on Biowaste Resistojet Nozzle Performance

JAMES M. KALLIS, MILTON GOODMAN, CARL R. HALBACH
1972 Journal of Spacecraft and Rockets  
R. and Greco, R. V., “Design and Operational Characteristics of an Integrated Biowaste Resistojet System,” AIAA Paper 71-686, Salt Lake City, Utah, 1971. 2 Gaubatz, W. A., James, N. E., and Page, R.  ...  SPACECRAFT Viscous Effects on Biowaste Resistojet Nozzle Performance JAMES M. KALLIS* AND MILTON GOODMANT McDonnell Douglas Corporation, Huntington Beach, Calif. CarL R.  ... 
doi:10.2514/3.61814 fatcat:iooyywotyrh23hb6qxtm7gbbka
« Previous Showing results 1 — 15 out of 33,698 results