Filters








1,644 Hits in 4.9 sec

ASIP architecture exploration for efficient IPSec encryption

Hanno Scharwaechter, David Kammler, Andreas Wieferink, Manuel Hohenauer, Kingshuk Karuri, Jianjiang Ceng, Rainer Leupers, Gerd Ascheid, Heinrich Meyr
2007 ACM Transactions on Embedded Computing Systems  
Efficient ASIP design requires an iterative architecture exploration loop -gradual refinement of processor architecture starting from an initial template.  ...  This paper describes an architecture exploration loop for an ASIP coprocessor which implements common encryption functionality used in symmetric block cipher algorithms for IPsec.  ...  Tool based processor architecture exploration loop Fig. 4 . 4 Fig. 3. Blowfish Fig. 6 . 6 Fig. 6. Parallel S-Box access in the execution stage.  ... 
doi:10.1145/1234675.1234679 fatcat:webkzsdkrvho7k3xqivnhskywu

ASIP Architecture Exploration for Efficient Ipsec Encryption: A Case Study [chapter]

Hanno Scharwaechter, David Kammler, Andreas Wieferink, Manuel Hohenauer, Kingshuk Karuri, Jianjiang Ceng, Rainer Leupers, Gerd Ascheid, Heinrich Meyr
2004 Lecture Notes in Computer Science  
Efficient ASIP design requires an iterative architecture exploration loop -gradual refinement of processor architecture starting from an initial template.  ...  This paper describes an architecture exploration loop for an ASIP coprocessor which implements common encryption functionality used in symmetric block cipher algorithms for IPsec.  ...  Tool based processor architecture exploration loop Fig. 4 . 4 Fig. 3. Blowfish Fig. 6 . 6 Fig. 6. Parallel S-Box access in the execution stage.  ... 
doi:10.1007/978-3-540-30113-4_4 fatcat:e477r42nlzbpjbl3u6y7p3txn4

Power Aware Framework for Dense Matrix Operations in Multimedia Processors

N. Zafar Azeemi
2005 2005 Pakistan Section Multitopic Conference  
The approach is illustrated using functional unit usage within a VLIW architecture for low power, which improves energy dissipation up to 34% and CPU performance up to 87% for an idct example.  ...  In this paper we analyze 1 the use of Decision Tree Grafting, Blocking and Loop Unfolding to improve the performance of dense matrix computations on high performance multimedia processors.  ...  The approach is illustrated using functional unit usage within a VLIW architecture and identifies a new operation rebinding technique for low power.  ... 
doi:10.1109/inmic.2005.334414 fatcat:pzmkdxwzx5a5flh4kczmsxv5za

Exploring the potential of heterogeneous von neumann/dataflow execution models

Tony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam
2015 Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15  
Lipasti, “Revolver: Processor architec- lar workloads, completely obviating the need for short-vector ture for power efficient loop execution,” in HPCA, 2014.  ...  executed program regions, a combination of power-efficient bines known dataflow-architecture techniques for high en- hardware structures, and a set of compiler techniques.  ... 
doi:10.1145/2749469.2750380 dblp:conf/isca/NowatzkiGS15 fatcat:hql7xymzgjch3jv4dk5mvbesji

Exploring the potential of heterogeneous von neumann/dataflow execution models

Tony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam
2015 SIGARCH Computer Architecture News  
General purpose processors (GPPs), from small inorder designs to many-issue out-of-order, incur large power overheads which must be addressed for future technology generations.  ...  Interestingly, well known explicit-dataflow architectures eliminate these overheads by directly executing the data-dependence graph and eschewing instruction-precise recoverability.  ...  Support for this research was provided by NSF under the grant CNS-1228782 and by a Google US/Canada PhD Fellowship.  ... 
doi:10.1145/2872887.2750380 fatcat:f7i5ox5p6vgq5eqd65isiyhe2a

Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI [article]

Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini
2019 arXiv   pre-print
and outlines directions to maintain high energy efficiency even for small matrix sizes where the vector architecture achieves suboptimal utilization of the available FPUs.  ...  An analysis on several vectorizable linear algebra computation kernels for a range of different matrix and vector sizes gives insight into performance limitations and bottlenecks for vector processors  ...  ACKNOWLEDGMENTS We would like to thank Frank Gürkaynak and Francesco Conti for the helpful discussions and insights.  ... 
arXiv:1906.00478v3 fatcat:h7zn4tkpqjf6xd35iuacpkre2a

Techniques for low energy software

Huzefa Mehta, Robert Michael Owens, Mary Jane Irwin, Rita Chen, Debashree Ghosh
1997 Proceedings of the 1997 international symposium on Low power electronics and design - ISLPED '97  
In addition several compiler techniques such as loop unrolling, software pipelining, recursion elimination and of effects of different algorithms on power and energy consumption are studied.  ...  This evaluation methodology is useful for computer architects to evaluate energy improvements of their hardware, compiler writers to evaluate energy of the compiled code and program writers to evaluate  ...  [15] build the instruction level power models after the design has been completed using actual current measurements of the processor chip as it executes instruction patterns. Landman et al.  ... 
doi:10.1145/263272.263286 dblp:conf/islped/MehtaOICG97 fatcat:eoggc6xvprb6ta7k2rvov5x7bu

Evaluación de parámetros de optimización GCC

Rodrigo D. Escobar, Alekya R. Angula, Mark Corsi
2012 Ingenierías USBMed  
In the mean time, compilers will need to become even more efficient at utilizing the underlying system architecture through self-optimization.  ...  Furthermore, sometimes such code may be less efficient than a code that has been compiled for generic hardware.  ... 
doi:10.21500/20275846.272 fatcat:z4v2ut2s3jfixkeugcl5ogapbu

Performance Estimation of a LEON 3FT Processor Based Design for Spacecraft Applications

Shruthi N, Prashant Kulshreshtha, Dinakaran E, Dr. Girish V Attimarad, Mr. Subramanya Udupa
2014 IOSR Journal of Electronics and Communication Engineering  
A set of selected benchmark programs have been executed on the superior processor mainly to track the execution times.  ...  The content of this paper is intended to highlight the performance of the 32-bit LEON 3FT processor in terms of execution speed in comparison with the currently used 16-bit processor.  ...  The logics may also be PC running GRMON Protoboard containing LEON 3FT processor RS232 cable UART interface called within multiple looping constructs for the purpose of testing complex looping times.  ... 
doi:10.9790/2834-09354854 fatcat:zycynh5kpzc6ljpjujkoe7nw64

Task Scheduling Frameworks for Heterogeneous Computing Toward Exascale

Suhelah Sandokji, Fathy Eassa
2018 International Journal of Advanced Computer Science and Applications  
The race for Exascale Computing has naturally led computer architecture to transit from the multicore era and into the heterogeneous era.  ...  They investigate the important role of optimization and tackle intelligently scheduled tasks on the combination of CPU/GPU architecture CPUs and GPUs cores in achieving the peace of performance and power  ...  In [77] the researchers study the impact of power variation of scheduling multi programming concurrently. They present an efficient algorithm for power capping. VIII.  ... 
doi:10.14569/ijacsa.2018.091029 fatcat:xqr3zoybwjbq5etrsb3msdjrxq

Performance efficiency of context-flow system-on-chip platform

R. Beidas, Jianwen Zhu
2003 ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486)  
We demonstrate the performance efficiency of this architecture over bus based and packet-switch based networks by two case studies using a multi-processor architecture simulator.  ...  Recent efforts in adapting computer networks into system-on-chip (SOC), or network-on-chip, present a setback to the traditional computer systems for the lack of effective programming model, while not  ...  For this purpose, the memory space and register files were replicated, one per PE, and the main execution loop of the simulator was modified to execute one instruction from each PE code at each simulation  ... 
doi:10.1109/iccad.2003.159711 fatcat:fksjka24dzed7db225wv3mw33y

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads [article]

Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini
2020 arXiv   pre-print
more flexible than a contemporary vector processor lane, achieving a 2× energy-efficiency improvement.  ...  With increasing integration density, the quest for energy efficiency becomes the number one design concern.  ...  similar compute per area efficiency with around 6 % for all execution units [8] .  ... 
arXiv:2002.10143v1 fatcat:jrugjgr4yzdyro4tka3czt6x64

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

Artur Podobas, Kentaro Sano, Satoshi Matsuoka
2020 IEEE Access  
recent research has shown performance-or power-benefits for multiple applications [10]-[14].  ...  These limitations have been recognized for decades (e.g., [15]-[17]), and have driven forth a different branch of reconfigurable architecture: the Coarse-Grained Reconfigurable Architecture (CGRAs).  ...  and power-efficiency.  ... 
doi:10.1109/access.2020.3012084 fatcat:xx6k4lxbjbc4tjebbymp42w634

Customized architectures for faster route finding in GPS-based navigation systems

Jason Loew, Dmitry Ponomarev, Patrick H. Madden
2010 2010 IEEE 8th Symposium on Application Specific Processors (SASP)  
In this paper, we present a practical approach to extract small-scale parallelism by shifting priority queue operations to a secondary tightly-coupled processor.  ...  We obtain a substantial speedup on real-world graphs (in particular, road maps), allowing the development of navigation systems that are more responsive, and also lower in total power consumption.  ...  In [3] , novel loop acceleration architecture and the dynamic algorithm for mapping loops onto the loop accelerators are presented and analyzed.  ... 
doi:10.1109/sasp.2010.5521148 dblp:conf/sasp/LoewPM10 fatcat:tnsiaalz25bezbh77ekn2wkyta

3D tomography back-projection parallelization on FPGAs using opencl

Maxime Martelli, Nicolas Gag, Alain Merigot, Cyrille Enderli
2017 2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)  
For this purpose, we start with evaluating different custom OpenCL implementations of the backprojection algorithm.  ...  This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms.  ...  A key difficulty for single work-item implementations are loop handling, because the Altera Offline Compiler default behaviour is to have each loop iteration executed sequentially, thus drastically reducing  ... 
doi:10.1109/dasip.2017.8122119 dblp:conf/dasip/MartelliGME17 fatcat:ujzjjcughzckplnqpknwmm6e7u
« Previous Showing results 1 — 15 out of 1,644 results