Filters








1,173 Hits in 8.2 sec

BuildMaster: Efficient ASIP architecture exploration through compilation and simulation result caching

Roel Jordans, Erkan Diken, Lech Jozwiak, Henk Corporaal
2014 17th International Symposium on Design and Diagnostics of Electronic Circuits & Systems  
Both the compilation and the simulation cache can greatly help to shorten the exploration time and make it possible to use more realistic data for the evaluation of selected designs.  ...  This framework supports the design space exploration of application specific VLIW processors and offers automated caching of intermediate compilation and simulation results.  ...  Cache hit-rates The hit-rates of both caches may give us a better insight into the actual benefits of the caching. Figure 6 shows the hit-rates observed in our experiments for both caches.  ... 
doi:10.1109/ddecs.2014.6868768 dblp:conf/ddecs/JordansDJC14 fatcat:ea7f3gvagzbw3gu2ifutoeyopm

Static analysis for fast and accurate design space exploration of caches

Yun Liang, Tulika Mitra
2008 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis - CODES/ISSS '08  
Simulation, in particular trace-driven simulation, is widely used to estimate cache hit rates.  ...  In this paper, we propose a novel static analysis technique for rapid and accurate design space exploration of instruction caches.  ...  Let us use B to represent the set of the basic blocks of the program and R hit to represent the cache hit rate of the program.  ... 
doi:10.1145/1450135.1450159 dblp:conf/codes/LiangM08 fatcat:jynm27g6ubcnrbdopvl5y6r6p4

A power estimation technique for cycle-accurate higher-abstraction SystemC-based CPU models

Efstathios Sotiriou-Xanthopoulos, G. Shalina Percy Delicia, Peter Figuli, Kostas Siozios, George Economakos, Jurgen Becker
2015 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)  
Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been  ...  By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the  ...  [14] , as such approaches use as input average values for the number of cycles per instruction, the cache miss rate, the memory access rate, etc., which are not always representative: For example, the  ... 
doi:10.1109/samos.2015.7363661 dblp:conf/samos/Sotiriou-Xanthopoulos15 fatcat:fftruaylxjemnfvx7uyvvzqg3u

Machine Learning Enabled Scalable Performance Prediction of Scientific Codes [article]

Gopinath Chennupati and Nandakishore Santhi and Phill Romero and Stephan Eidenbenz
2020 arXiv   pre-print
We analyze the application of multi-variate regression models that accurately predict the reuse profiles and the basic block counts.  ...  PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of  ...  These profiles are further used in estimating the hit-rates. Figure 9 shows the conditional hit-rates of JACOBI for all the three different cache hierarchies.  ... 
arXiv:2010.04212v2 fatcat:53bor5hw5zgxpp5feymwg5imsy

High-level power analysis for multi-core chips

Noel Eisley, Vassos Soteriou, Li-Shiuan Peh
2006 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems - CASES '06  
Rapid, early-stage power estimation of these multi-core chips is crucial in assisting compilers in determining the most efficient thread partitioning and placement.  ...  Our tool has been validated against the cycle-accurate BTL simulator of the MIT Raw CMP, showing an average speedup of 7X while achieving relative accuracy of 9.1%.  ...  if so, in which level of the cache it hits.  ... 
doi:10.1145/1176760.1176807 dblp:conf/cases/EisleySP06 fatcat:k6utslxgvrb6noiqhj5o6o4s7e

A Low-power Shared Cache Design with Modified PID Controller for Efficient Multicore Embedded Systems

Huatao Zhao, Jiongyao Ye, Takahiro Watanabe
2019 Journal of Information Processing  
However, we observe that a large portion of cache banks are wasted, meaning that those banks are rarely used but consume a great deal of energy during their entire lifetime.  ...  Nowadays, on-chip cache scales are oversized in multicore embedded systems, and those caches even consume half of the total energy debit.  ...  Finally, we use cache co-scheduler to coordinate the allocation of cache banks in multi-thread parallel, and then one control loop ends here.  ... 
doi:10.2197/ipsjjip.27.149 fatcat:chpw6vkxh5a7pinvmhyyxwu45y

A Survey of Embedded Software Profiling Methodologies

Rajendra Patel
2011 International Journal of Embedded Systems and Applications  
It is achieved by profiling the application with variety of aspects like performance, memory usage, cache hit versus cache miss, energy consumption, etc.  ...  Out of these, performance estimation is more important than others.  ...  The cache is used to store frequency counts of different critical loops and is indexed into using sbb instruction addresses. Here sbb is representing any short backward jump instruction.  ... 
doi:10.5121/ijesa.2011.1203 fatcat:vwf5plrtdzbdrdx25nz4gdsgce

From software to accelerators with LegUp high-level synthesis

Andrew Canis, Jongsok Choi, Blair Fort, Ruolong Lian, Qijing Huang, Nazanin Calagar, Marcel Gort, Jia Jun Qin, Mark Aldham, Tomasz Czajkowski, Stephen Brown, Jason Anderson
2013 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)  
This paper presents on overview of the LegUp design methodology and system architecture, and discusses ongoing work on profiling, hardware/software partitioning, hardware accelerator quality improvements  ...  Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators.  ...  The financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and Altera Corporation is gratefully acknowledged.  ... 
doi:10.1109/cases.2013.6662524 dblp:conf/cases/CanisCFLHCGQACBA13 fatcat:mkl646vbefa43irr2i725vmh6u

0 Instruction Set Architecture [chapter]

2003 Digital Design and Computer Organization  
Cache hit-rates The hit-rates of both caches may give us a better insight into the actual benefits of the caching. both caches.  ...  shows the hit-rates observed in our experiments for Observed cache hit-rates for two different architecture exploration strategies 6. 1 : 1 Detailed experimental results showing caching induced exploration  ...  automatisering van belangrijke tussenstappen zoals de constructie van een nieuw processor ontwerp vanuit een hoog niveau omschrijving, het evalueren van een kandidaat ontwerp door middel van simulatie of  ... 
doi:10.1201/b12403-15 fatcat:mygaz2meibgljew5tzvmuw6x5i

Custom wide counterflow pipelines for high-performance embedded applications

B.R. Childers, J.W. Davidson
2004 IEEE transactions on computers  
Using an analytic cost model, we show that custom WCFPs do not unduly increase the cost of the original counterflow pipeline architecture, yet they retain the simplicity of the CFP.  ...  Application-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems.  ...  cache hit rates of up to 99 percent.  ... 
doi:10.1109/tc.2004.1261825 fatcat:pswfmrtejjglxcwtnfoanf4m7q

Exploiting statistical information for implementation of instruction scratchpad memory in embedded system

A. Janapsatya, A. Ignjatovic, S. Parameswaran
2006 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Comprising of a scratchpad memory instead of an instruction cache, the target system dynamically (at runtime) copies into the scratchpad code segments that are determined to be beneficial (in terms of  ...  For a set of realistic benchmarks, experimental results indicate the method uses 41.9% lower energy (on average) and improves performance by 40.0% (on average) when compared to a traditional cache system  ...  Estimation of memory access time is possible due to known SPM access time, cache access time, DRAM access time, hit rates of cache, and number of times the SPM contents are changed.  ... 
doi:10.1109/tvlsi.2006.878470 fatcat:nturpqdonfbodd4ehtzgnbx4oa

Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed

Andreas Sandberg, Nikos Nikoleris, Trevor E. Carlson, Erik Hagersten, Stefanos Kaxiras, David Black-Schaffer
2015 2015 IEEE International Symposium on Workload Characterization  
Using virtualized fast-forwarding, we demonstrate a parallel sampling simulator that can be used to accurately estimate the IPC of standard workloads with an average error of 2.2% while still reaching  ...  This leads to two problems: First, due to the slow simulation rate, simulation studies are usually limited to the first few billion instructions, which corresponds to less than 10% the execution time of  ...  This may underestimate the performance of the simulated cache as some of the misses might have been hits had the cache been fully warmed.  ... 
doi:10.1109/iiswc.2015.29 dblp:conf/iiswc/SandbergNCHKB15 fatcat:nvfwvc37ubhk7ewvp4d6bfadoa

Instruction Trace Compression for Rapid Instruction Cache Simulation

Andhi Janapsatya, Aleksandar Ignjatovic, Sri Parameswaran, Joerg Henkel
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
Modern Application Specific Instruction Set Processors (ASIPs) have Although compression allows the reduction of the program trace file customizable caches, where the size, associativity and line size  ...  Simulation of cache performance using large program only 'partial decompression' is necessary. Our experimental results trace files is a time consuming process.  ...  case cache miss rates.  ... 
doi:10.1109/date.2007.364389 dblp:conf/date/JanapsatyaIPH07 fatcat:ilg7vb2tgzfxhil6tj7fawi4jm

Missing the memory wall

Ashley Saulsbury, Fong Pong, Andreas Nowatzyk
1996 SIGARCH Computer Architecture News  
A comparable "high-end" machine of the same era is the Sparc-Station 10/61 (SS-10/61), containing a super-scalar SuperSparc CPU with two cache levels; separate 20KB instruction and 16KB data caches at  ...  Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems.  ...  acknowledge the valuable help, feedback and inspiration they received from Gunes Aybay, Clement Fang, Howard Davidson, Mark Hill, Sally McKee, William Radke, Eugen Schenfeld, Sanjay Vishin, the engineers of  ... 
doi:10.1145/232974.232984 fatcat:w5c3hi3725dpdpc76725f5pyqq

Missing the memory wall

Ashley Saulsbury, Fong Pong, Andreas Nowatzyk
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
A comparable "high-end" machine of the same era is the Sparc-Station 10/61 (SS-10/61), containing a super-scalar SuperSparc CPU with two cache levels; separate 20KB instruction and 16KB data caches at  ...  Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems.  ...  acknowledge the valuable help, feedback and inspiration they received from Gunes Aybay, Clement Fang, Howard Davidson, Mark Hill, Sally McKee, William Radke, Eugen Schenfeld, Sanjay Vishin, the engineers of  ... 
doi:10.1145/232973.232984 dblp:conf/isca/SaulsburyPN96 fatcat:ut72ah2zxzh73onrac3vems5aq
« Previous Showing results 1 — 15 out of 1,173 results