A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
BuildMaster: Efficient ASIP architecture exploration through compilation and simulation result caching
2014
17th International Symposium on Design and Diagnostics of Electronic Circuits & Systems
Both the compilation and the simulation cache can greatly help to shorten the exploration time and make it possible to use more realistic data for the evaluation of selected designs. ...
This framework supports the design space exploration of application specific VLIW processors and offers automated caching of intermediate compilation and simulation results. ...
Cache hit-rates The hit-rates of both caches may give us a better insight into the actual benefits of the caching. Figure 6 shows the hit-rates observed in our experiments for both caches. ...
doi:10.1109/ddecs.2014.6868768
dblp:conf/ddecs/JordansDJC14
fatcat:ea7f3gvagzbw3gu2ifutoeyopm
Static analysis for fast and accurate design space exploration of caches
2008
Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis - CODES/ISSS '08
Simulation, in particular trace-driven simulation, is widely used to estimate cache hit rates. ...
In this paper, we propose a novel static analysis technique for rapid and accurate design space exploration of instruction caches. ...
Let us use B to represent the set of the basic blocks of the program and R hit to represent the cache hit rate of the program. ...
doi:10.1145/1450135.1450159
dblp:conf/codes/LiangM08
fatcat:jynm27g6ubcnrbdopvl5y6r6p4
A power estimation technique for cycle-accurate higher-abstraction SystemC-based CPU models
2015
2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)
Due to the ever-increasing complexity of embedded system design and the need for rapid system evaluations in early design stages, the use of simulation models known as Virtual Platforms (VPs) has been ...
By using a set of benchmarks on a power-annotated SystemC/TLM model of Xilinx Microblaze soft-processor, it is shown that the proposed approach can achieve accurate power estimation in comparison to the ...
[14] , as such approaches use as input average values for the number of cycles per instruction, the cache miss rate, the memory access rate, etc., which are not always representative: For example, the ...
doi:10.1109/samos.2015.7363661
dblp:conf/samos/Sotiriou-Xanthopoulos15
fatcat:fftruaylxjemnfvx7uyvvzqg3u
Machine Learning Enabled Scalable Performance Prediction of Scientific Codes
[article]
2020
arXiv
pre-print
We analyze the application of multi-variate regression models that accurately predict the reuse profiles and the basic block counts. ...
PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of ...
These profiles are further used in estimating the hit-rates. Figure 9 shows the conditional hit-rates of JACOBI for all the three different cache hierarchies. ...
arXiv:2010.04212v2
fatcat:53bor5hw5zgxpp5feymwg5imsy
High-level power analysis for multi-core chips
2006
Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems - CASES '06
Rapid, early-stage power estimation of these multi-core chips is crucial in assisting compilers in determining the most efficient thread partitioning and placement. ...
Our tool has been validated against the cycle-accurate BTL simulator of the MIT Raw CMP, showing an average speedup of 7X while achieving relative accuracy of 9.1%. ...
if so, in which level of the cache it hits. ...
doi:10.1145/1176760.1176807
dblp:conf/cases/EisleySP06
fatcat:k6utslxgvrb6noiqhj5o6o4s7e
A Low-power Shared Cache Design with Modified PID Controller for Efficient Multicore Embedded Systems
2019
Journal of Information Processing
However, we observe that a large portion of cache banks are wasted, meaning that those banks are rarely used but consume a great deal of energy during their entire lifetime. ...
Nowadays, on-chip cache scales are oversized in multicore embedded systems, and those caches even consume half of the total energy debit. ...
Finally, we use cache co-scheduler to coordinate the allocation of cache banks in multi-thread parallel, and then one control loop ends here. ...
doi:10.2197/ipsjjip.27.149
fatcat:chpw6vkxh5a7pinvmhyyxwu45y
A Survey of Embedded Software Profiling Methodologies
2011
International Journal of Embedded Systems and Applications
It is achieved by profiling the application with variety of aspects like performance, memory usage, cache hit versus cache miss, energy consumption, etc. ...
Out of these, performance estimation is more important than others. ...
The cache is used to store frequency counts of different critical loops and is indexed into using sbb instruction addresses. Here sbb is representing any short backward jump instruction. ...
doi:10.5121/ijesa.2011.1203
fatcat:vwf5plrtdzbdrdx25nz4gdsgce
From software to accelerators with LegUp high-level synthesis
2013
2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)
This paper presents on overview of the LegUp design methodology and system architecture, and discusses ongoing work on profiling, hardware/software partitioning, hardware accelerator quality improvements ...
Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. ...
The financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and Altera Corporation is gratefully acknowledged. ...
doi:10.1109/cases.2013.6662524
dblp:conf/cases/CanisCFLHCGQACBA13
fatcat:mkl646vbefa43irr2i725vmh6u
0 Instruction Set Architecture
[chapter]
2003
Digital Design and Computer Organization
Cache hit-rates The hit-rates of both caches may give us a better insight into the actual benefits of the caching. both caches. ...
shows the hit-rates observed in our experiments for Observed cache hit-rates for two different architecture exploration strategies
6. 1 : 1 Detailed experimental results showing caching induced exploration ...
automatisering van belangrijke tussenstappen zoals de constructie van een nieuw processor ontwerp vanuit een hoog niveau omschrijving, het evalueren van een kandidaat ontwerp door middel van simulatie of ...
doi:10.1201/b12403-15
fatcat:mygaz2meibgljew5tzvmuw6x5i
Custom wide counterflow pipelines for high-performance embedded applications
2004
IEEE transactions on computers
Using an analytic cost model, we show that custom WCFPs do not unduly increase the cost of the original counterflow pipeline architecture, yet they retain the simplicity of the CFP. ...
Application-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems. ...
cache hit rates of up to 99 percent. ...
doi:10.1109/tc.2004.1261825
fatcat:pswfmrtejjglxcwtnfoanf4m7q
Exploiting statistical information for implementation of instruction scratchpad memory in embedded system
2006
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
Comprising of a scratchpad memory instead of an instruction cache, the target system dynamically (at runtime) copies into the scratchpad code segments that are determined to be beneficial (in terms of ...
For a set of realistic benchmarks, experimental results indicate the method uses 41.9% lower energy (on average) and improves performance by 40.0% (on average) when compared to a traditional cache system ...
Estimation of memory access time is possible due to known SPM access time, cache access time, DRAM access time, hit rates of cache, and number of times the SPM contents are changed. ...
doi:10.1109/tvlsi.2006.878470
fatcat:nturpqdonfbodd4ehtzgnbx4oa
Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed
2015
2015 IEEE International Symposium on Workload Characterization
Using virtualized fast-forwarding, we demonstrate a parallel sampling simulator that can be used to accurately estimate the IPC of standard workloads with an average error of 2.2% while still reaching ...
This leads to two problems: First, due to the slow simulation rate, simulation studies are usually limited to the first few billion instructions, which corresponds to less than 10% the execution time of ...
This may underestimate the performance of the simulated cache as some of the misses might have been hits had the cache been fully warmed. ...
doi:10.1109/iiswc.2015.29
dblp:conf/iiswc/SandbergNCHKB15
fatcat:nvfwvc37ubhk7ewvp4d6bfadoa
Instruction Trace Compression for Rapid Instruction Cache Simulation
2007
2007 Design, Automation & Test in Europe Conference & Exhibition
Modern Application Specific Instruction Set Processors (ASIPs) have Although compression allows the reduction of the program trace file customizable caches, where the size, associativity and line size ...
Simulation of cache performance using large program only 'partial decompression' is necessary. Our experimental results trace files is a time consuming process. ...
case cache miss rates. ...
doi:10.1109/date.2007.364389
dblp:conf/date/JanapsatyaIPH07
fatcat:ilg7vb2tgzfxhil6tj7fawi4jm
Missing the memory wall
1996
SIGARCH Computer Architecture News
A comparable "high-end" machine of the same era is the Sparc-Station 10/61 (SS-10/61), containing a super-scalar SuperSparc CPU with two cache levels; separate 20KB instruction and 16KB data caches at ...
Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. ...
acknowledge the valuable help, feedback and inspiration they received from Gunes Aybay, Clement Fang, Howard Davidson, Mark Hill, Sally McKee, William Radke, Eugen Schenfeld, Sanjay Vishin, the engineers of ...
doi:10.1145/232974.232984
fatcat:w5c3hi3725dpdpc76725f5pyqq
Missing the memory wall
1996
Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96
A comparable "high-end" machine of the same era is the Sparc-Station 10/61 (SS-10/61), containing a super-scalar SuperSparc CPU with two cache levels; separate 20KB instruction and 16KB data caches at ...
Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. ...
acknowledge the valuable help, feedback and inspiration they received from Gunes Aybay, Clement Fang, Howard Davidson, Mark Hill, Sally McKee, William Radke, Eugen Schenfeld, Sanjay Vishin, the engineers of ...
doi:10.1145/232973.232984
dblp:conf/isca/SaulsburyPN96
fatcat:ut72ah2zxzh73onrac3vems5aq
« Previous
Showing results 1 — 15 out of 1,173 results