1,335 Hits in 6.4 sec

Key-Study to Execute Code Using Demand Paging and NAND Flash at Smart Card Scale [chapter]

Geoffroy Cogniaux, Gilles Grimaud
2010 Lecture Notes in Computer Science  
In this paper, we show that we can dramatically increase performance by reducing the size of pages in the cache. This solution then allows a more intelligent access to the NAND.  ...  Nowadays, the desire to embed more applications in systems as small as Smart Cards or sensors is growing.  ...  Combined with an intelligent access to the NAND data register, reducing cache page size dramatically improves performance where conventional approach was close to worst case in term of bandwidth.  ... 
doi:10.1007/978-3-642-12510-2_8 fatcat:uimxiq7lsbeq5fzxkedj4ma6pe

Dynamic scratchpad memory management for code in portable systems with an MMU

Bernhard Egger, Jaejin Lee, Heonshik Shin
2008 ACM Transactions on Embedded Computing Systems  
We show that by using the data cache as a victim buffer for the SPM, significant energy savings are possible.  ...  In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU).  ...  The reference images run on an ARM926EJ-S core with an instruction and a data cache. For each application, the instruction and data cache sizes are set to the corresponding values in Table V .  ... 
doi:10.1145/1331331.1331335 fatcat:jeozr3rgcfb65ln6rptjs5rbf4

Data and memory optimization techniques for embedded systems

P. R. Panda, F. Catthoor, N. D. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandercappelle, P. G. Kjeldsberg
2001 ACM Transactions on Design Automation of Electronic Systems  
We present a survey of the state-of-the-art techniques used in performing data and memoryrelated optimizations in embedded systems.  ...  The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation  ...  It should be noted that data layout for improving data cache performance has an analogue in instruction caches.  ... 
doi:10.1145/375977.375978 fatcat:v7ekrrpchfc47nycvicmuouk2u

IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion

Kai Ren, Qing Zheng, Swapnil Patil, Garth Gibson
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
We also propose two client-based stormfree caching techniques: bulk namespace insertion for creation intensive workloads such as N-N checkpointing; and stateless consistent metadata caching for hot spot  ...  The growing size of modern storage systems is expected to exceed billions of objects, making metadata scalability critical to overall performance.  ...  We especially thank Los Alamos National Laboratory for running our software on one of their HPC clusters (Smog), Panasas for providing a storage cluster and LinkedIn for giving us a trace of its HDFS metadata  ... 
doi:10.1109/sc.2014.25 dblp:conf/sc/RenZPG14 fatcat:g443hulj4jhjtjzpaqumutjmia

Improving disk bandwidth-bound applications through main memory compression

Vicenç Beltran, Jordi Torres, Eduard Ayguadé
2007 Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture - MEDEA '07  
On the other hand, its main drawback is the large amount of CPU power needed by the computationally expensive compression algorithms, that make it unsuitable for medium to large CPU intensive applications  ...  The objective of main memory compression techniques is to reduce the in-memory data size to virtually enlarge the available memory on the system.  ...  With the obtained results we can anticipate the big impact that this technology can have in conjunction with new multiprocessor and multicore technologies like the Niagara [13] or CELL [20] processors  ... 
doi:10.1145/1327171.1327178 fatcat:lpa6zvtvivg7blu3pazmpfzvwm

The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores

Igor Loi, Alessandro Capotondi, Davide Rossi, Andrea Marongiu, Luca Benini
2018 IEEE Transactions on Multi-Scale Computing Systems  
High performance and extreme energy efficiency are strong requirements for a fast-growing number of edge-node Internet of Things (IoT) applications.  ...  Exploiting the instruction locality typical of data-parallel applications, we explore two different shared instruction cache architectures, based on energy-efficient latch-based memory banks: one leveraging  ...  Considering that for this class of applications the instruction cache can be a real bottleneck, and that the capacity of the caches has major impact on the performance, the SP is able to provide the best  ... 
doi:10.1109/tmscs.2017.2769046 fatcat:ta644xddx5b7hjmoqsipow2aiy

Worst-case execution time analysis-driven object cache design

Benedikt Huber, Wolfgang Puffitsch, Martin Schoeberl
2011 Concurrency and Computation  
Standard data caches perform comparably well in the average case, but accesses to heap data result in overly pessimistic WCET estimations.  ...  All performance-enhancing features need to be WCET analyzable. However, standard data caches containing heap allocated data are very hard to analyze statically.  ...  The Benchmarks For the evaluation of different object cache configurations we used four different benchmarks. Lift is a tiny, but real world, embedded application.  ... 
doi:10.1002/cpe.1763 fatcat:raa5gpdo3fhofnug574wvksey4

Contention in Multicore Hardware Shared Resources: Understanding of the State of the Art

Gabriel Fernandez, Jaume Abella, Eduardo Quiñones, Christine Rochange, Tullio Vardanega, Francisco J. Cazorla, Marc Herbstritt
2014 Worst-Case Execution Time Analysis  
This sparseness makes it difficult for any reader to form a coherent picture of the problem and solution space.  ...  The real-time systems community has over the years devoted considerable attention to the impact on execution timing that arises from contention on access to hardware shared resources.  ...  Instead, it controls how often tasks evict data from cache as a way to bound the impact of contention on tasks' WCET estimates.  ... 
doi:10.4230/oasics.wcet.2014.31 dblp:conf/wcet/FernandezAQRVC14 fatcat:xfhpnrtmf5a65ek3upravfesca

Improving Web Server Performance Through Main Memory Compression

Vicenç Beltran, Jordi Torres, Eduard Ayguadé
2008 2008 14th IEEE International Conference on Parallel and Distributed Systems  
Current web servers are highly multithreaded applications whose scalability benefits from the current multicore/multiprocessor trend.  ...  In this paper we implement to the Linux OS a full SMP capable main memory compression subsystem to increase the performance of a web server running the SPECweb2005 benchmark.  ...  In contrast, the CPU time dedicated to compressing data has a large impact as soon as we dedicate some memory to store compressed data.  ... 
doi:10.1109/icpads.2008.15 dblp:conf/icpads/BeltranTA08 fatcat:64snwbyedndljiqwrw22qpkud4

An Evaluation of Coarse-Grained Locking for Multicore Microkernels [article]

Kevin Elphinstone, Amirreza Zarrabi, Adrian Danis, Yanyan Shen, Gernot Heiser
2016 arXiv   pre-print
We revisit this trade-off in the context of microkernels and tightly-coupled cores with shared caches and low inter-core migration latencies.  ...  We evaluate performance on two architectures: x86 and ARM MPCore, in the former case also utilising transactional memory (Intel TSX).  ...  We therefore argue that it is important to understand the performance impact of a big-lock design, which maximises best-case performance, minimises complexity and eases assurance.  ... 
arXiv:1609.08372v2 fatcat:erghtyzjrfgwddddefwz45w7ri

The Tale of Java Performance

Osvaldo Pinali Doederlein
2003 Journal of Object Technology  
Java was not born to handle tiny web page embellishments forever, so it had to evolve to support everything from smart card applets to scalable enterprise applications.  ...  to make developers happy.  ...  IBM was the only big player to make a static compiler, but their commitment to J2EE probably killed HPCJ as much as the performance of its own JIT.  ... 
doi:10.5381/jot.2003.2.5.c3 fatcat:nwgd4zsa2feinepz622opcfq4e

Performance tuning for deep learning on a many-core processor (master thesis) [article]

Philippos Papaphilippou
2018 arXiv   pre-print
Finally, I investigate the potential for adaptive algorithms for further performance increase.  ...  Convolutional neural networks (CNNs) are becoming very successful and popular for a variety of applications.  ...  embedded simulator implementation.  ... 
arXiv:1806.01105v1 fatcat:ehwbxzqnfvfttjrtrmdvc6znv4

A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors

Sparsh Mittal
2016 ACM Computing Surveys  
We hope that more than just synthesizing the existing work on AMPs, the contribution of this survey will be to spark novel ideas for architecting future AMPs that can make a definite impact on the landscape  ...  and parallel performance.  ...  Executing CSs exclusively on big core obviates the need of frequent migration of lock and shared data between caches of big and small core, since such data can always remain in the cache of big core.  ... 
doi:10.1145/2856125 fatcat:3hda47vtl5fznfvbskwcm2cbo4

Improved Ahead-of-Time Compilation of Stack-Based JVM Bytecode on Resource-Constrained Devices [article]

Niels Reijers, Chi-Sheng Shih
2017 arXiv   pre-print
Compiling bytecode to native code to improve performance has been studied extensively for larger devices, but the restricted resources on sensor nodes mean most modern techniques cannot be applied.  ...  While this increases the size of the VM, the break-even point at which this fixed cost is compensated for is well within the range of memory available on a sensor device, allowing us to both improve performance  ...  This results in higher load/store overhead, and the two optimisations that target this overhead, popped value caching and mark loops, have a big impact.  ... 
arXiv:1712.05590v2 fatcat:7wbqicmzzfc6ndy65ppl6xd4cy


Cupertino Miranda, Antoniu Pop, Philippe Dumont, Albert Cohen, Marc Duranton
2010 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems - CASES '10  
False sharing induces severe overheads for tiny bursts and is the cause of the wide performance instabilities.  ...  Parallel stream and data-flow programming makes the task-level data-flow explicit.  ... 
doi:10.1145/1878921.1878924 dblp:conf/cases/MirandaPDCD10 fatcat:3wh77qldr5eorbelradm6h7ci4
« Previous Showing results 1 — 15 out of 1,335 results