Filters








18,748 Hits in 3.8 sec

Scalable vector processors for embedded systems

C.E. Kozyrakis, D.A. Patterson
2003 IEEE Micro  
processors for desktop systems adjusted to accommodate embedded designs, we can revise vector architectures for supercomputers to serve in embedded applications.  ...  Prototype vector processor We implemented a prototype vector processor to validate the VIRAM architecture's potential for embedded media processing and as a fast platform for software experimentation.  ...  His research interests include parallel architectures and compilation techniques for systems ranging from supercomputers to deeply embedded devices.  ... 
doi:10.1109/mm.2003.1261385 fatcat:arrxeb4uk5ek3ohjheugjmxyji

Reducing Synchronization Overheads In Cg-Type Parallel Iterative Solvers By Embedding Point-To-Point Communications Into Reduction Operations

Cevdet Aykanat
2014 Zenodo  
Experiments on two different supercomputers up to 2048 processors show that the proposed latency-avoiding method exhibits superior scalability, especially with increasing number of processors.  ...  Parallel iterative solvers are widely used in solving large sparse linear systems of equations on large-scale parallel architectures.  ...  (send/recv) to log for a system with processors.  ... 
doi:10.5281/zenodo.825434 fatcat:tlle3zymnjddfh75q27oq35oxa

Performance and power consumption evaluation of concurrent queue implementations in embedded systems

Lazaros Papadopoulos, Ivan Walulya, Paul Renaud-Goud, Philippas Tsigas, Dimitrios Soudris, Brendan Barry
2014 Computer Science - Research and Development  
Evaluation of Message Passing Synchronization Algorithms in Embedded SystemsEvaluation of Message Passing Synchronization Algorithms in Embedded Systems SHAVE (Streaming Hybrid Architecture Vector Engine  ...  in Embedded Systems Evaluation of Message Passing Synchronization Algorithms in Embedded Systems Evaluation of Message Passing Synchronization Algorithms in Embedded Systems • Introduction -Synchronization  ... 
doi:10.1007/s00450-014-0261-0 fatcat:bfgg5pmeunbobpaukclwdy234q

A Unified Approach for the Synthesis of Scalable and Testable Embedded Architectures [chapter]

Prashanth B. Bhat, Chouki Aktouf, Viktor K. Prasanna, Sandeep Gupta, Melvin A. Breuer
1998 Fault-Tolerant Parallel and Distributed Systems  
This paper presents a new synthesis approach for reliable high performance embedded systems. It considers requirements of both scalability and testability in an integrated manner.  ...  It then describes the architecture of typical high performance embedded systems and a suitable testing methodology for these systems.  ...  an embedded system environment.  ... 
doi:10.1007/978-1-4615-5449-3_12 fatcat:rawiqhhlojfo3b2r66snd3fqbm

TeraOPS hardware: A new massively-parallel MIMD computing fabric IC

Anthony Mark Jones, Mike Butts
2006 2006 IEEE Hot Chips 18 Symposium (HCS)  
2 Traditional architectures are reaching limits in performance, scalability and ease of development Single CPUs and DSPs are reaching limits of extending performance Ordinary multi-core processors won't  ...  Communication-Centric Design for Timing Scalability -Globally asynchronous, locally synchronous (GALS) Massive Parallelism for Power Scalability -MIMD architecture: power scales linearly with  ...  Chains of Ambric registers form Ambric channels-Fully encapsulated, fully scalable for control and data between objectsAmbric processors are interconnected by Ambric channels -Ambric registers permit inputs  ... 
doi:10.1109/hotchips.2006.7477853 fatcat:gmlq7ucibjfsdpgeevacv55hbq

Portable, Flexible, and Scalable Soft Vector Processors

P. Yiannacouras, J. G. Steffan, J. Rose
2012 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Such a soft vector processor can execute these kernels much faster than a single-core hence reducing the need for hardware implementations.  ...  To this end we propose extending soft processors with vector extensions to exploit the abundant data parallelism found in many embedded kernels.  ...  [29] , [41] first demonstrated the potential for vector processing as a simple-to-use and scalable accelerator for soft processors.  ... 
doi:10.1109/tvlsi.2011.2160463 fatcat:rxxooti4jfbjphgailmmcsj6fq

A Case for Soft Vector Processors in FPGAs

Jason Yu, Guy Lemieux
2007 2007 International Conference on Field-Programmable Technology  
Performance estimates of the soft vector processor using three embedded benchmark kernels show speedup of up to 16.6× over an idealized Nios II processor while using 10.9× the area.  ...  This paper proposes a soft vector processor for the Stratix III FPGA that can be scaled to different levels of performance and resource utilization.  ...  Acknowledgment The authors would like to thank Blair Fort for providing the UTIIe processor, and Christopher Eagleston for his help with analyzing the benchmarks.  ... 
doi:10.1109/fpt.2007.4439281 dblp:conf/fpt/YuL07 fatcat:kdn6fxnyanhddp77vzipucbbbi

Parallel Low-Storage Runge—Kutta Solvers for ODE Systems with Limited Access Distance

Matthias Korch, Thomas Rauber
2010 The international journal of high performance computing applications  
Such systems may arise, for example, from the semidiscretization of partial differential equations (PDEs).  ...  We focus on the exploitation of a special structure of commonly appearing ODE systems, referred to as 'limited access distance', to improve scalability and memory usage.  ...  Acknowledgments We thank the Jülich Supercomputing Centre and the Leibniz Supercomputing Centre Munich for providing access to their supercomputer systems.  ... 
doi:10.1177/1094342010384418 fatcat:jmrunra2yfgtvhpw4jxwzu4nkq

A Novel Method for Scaling Iterative Solvers: Avoiding Latency Overhead of Parallel Sparse-Matrix Vector Multiplies

R. Oguz Selvitopi, Muhammet Mustafa Ozdal, Cevdet Aykanat
2015 IEEE Transactions on Parallel and Distributed Systems  
Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional  ...  The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase.  ...  ACKNOWLEDGMENTS We acknowledge PRACE for awarding us access to resources Hermit (Cray XE6) based in Germany at High Performance Computing Center Stuttgart (HLRS) and Juqueen (Blue Gene/Q) based in Germany  ... 
doi:10.1109/tpds.2014.2311804 fatcat:oelniwaqebaine37re4c5gulfa

A low-power wireless camera system

A. Chandrakasan, A. Dancy, J. Goodman, T. Simon
1999 Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013)  
Embedded power supplies systems are also used to minimize energy dissipation under varying temperature, process parameters and computational workload.  ...  This paper describes the system design of a lowpower wireless camera. A system level approach is used to reduce energy dissipation and maximize battery lifetime.  ...  Figure 6 shows a die photo of the scalable encryption processor with embedded power supply.  ... 
doi:10.1109/icvd.1999.745120 dblp:conf/vlsid/ChandrakasanDGS99 fatcat:n3vjtgtr6ffkzgfx6d5ou2ti7y

Parallelized benchmark-driven performance evaluation of SMPs and tiled multi-core architectures for embedded systems

Arslan Munir, Ann Gordon-Ross, Sanjay Ranka
2012 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC)  
With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multi-core to exploit this high transistor density for high performance.  ...  suitable for applications with floating point computations and a large amount of communication between processor cores.  ...  Specifically, results indicated that the TILEPro64 exhibited better scalability and attained better performance per watt than the SMPs for applications involving integer operations and for the applications  ... 
doi:10.1109/pccc.2012.6407785 dblp:conf/ipccc/MunirGR12 fatcat:ej24xs7gvrhdbffctjdy3ktmcq

A low-power accelerator for the SPHINX 3 speech recognition system

Binu Mathew, Al Davis, Zhen Fang
2003 Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03  
2.4 GHz Pentium 4 system.  ...  However after normalizing for process, the special-purpose approach has twice the throughput, and consumes 104 times less energy than the general-purpose processor.  ...  We would also like to thank Mike Parker of the University of Utah for valuable consultation and for modifying our Pentium 4 system for power measurements.  ... 
doi:10.1145/951736.951739 fatcat:r5yuzunhpfghbnnizrkfcszese

A low-power accelerator for the SPHINX 3 speech recognition system

Binu Mathew, Al Davis, Zhen Fang
2003 Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03  
2.4 GHz Pentium 4 system.  ...  However after normalizing for process, the special-purpose approach has twice the throughput, and consumes 104 times less energy than the general-purpose processor.  ...  We would also like to thank Mike Parker of the University of Utah for valuable consultation and for modifying our Pentium 4 system for power measurements.  ... 
doi:10.1145/951710.951739 dblp:conf/cases/MathewDF03 fatcat:vc4fgoubnva43lcn7xoo7l35uu

EMVS: Embedded Multi Vector-core System

Tassadaq Hussain, Amna Haider, Adrian Cristal, Eduard Ayguadé
2018 Journal of systems architecture  
The existing embedded HPC systems suffer from issues like programmability, scalability, and portability.  ...  In this work, we proposed an embedded multi vector-core system (EMVS) which executes the embedded application by managing the multiple vectorized tasks and their memory operations.  ...  Embedded Multi Vector-core Processor System The Embedded Multi Vector-core Processor System (EMVS) is shown in Figure 1 .  ... 
doi:10.1016/j.sysarc.2018.04.002 fatcat:44lbldctjnekjpfwrnmicmkhc4

Distributed Synchronization for Message-Passing Based Embedded Multiprocessors

Hao XIAO, Ning WU, Fen GE, Guanyu ZHU, Lei ZHOU
2015 IEICE transactions on information and systems  
Experimental results show the proposed synchronization achieves ultra-low latency and almost ideal scalability when the number of processors increases.  ...  By using state-of-theart Application-Specific Instruction-set Processor (ASIP) technology, we embed the synchronization functionality into a baseline processor, making the proposed mechanism feature ultra-low  ...  However, these centralized solutions fundamentally introduce a bottleneck when the system scales to more cores.  ... 
doi:10.1587/transinf.2014rcl0001 fatcat:ei4f3s44qbfcflbf5z2x4zs7da
« Previous Showing results 1 — 15 out of 18,748 results