A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is application/pdf
.
Filters
Systematic data reuse exploration methodology for irregular access patterns
Proceedings 13th International Symposium on System Synthesis
They work well for homogeneous signal access patterns but cannot handle other cases. ...
where holes are present in the signal access pattern. ...
Current methodology for data reuse exploration In this Section we give a short summary of the current methodology for data reuse exploration [14] [7]. ...
doi:10.1109/isss.2000.874037
dblp:conf/isss/AchterenLC00
fatcat:xyen2nkyqzerpbyjeyyqnfecli
Scaling irregular parallel codes with minimal programming effort
2001
Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01
We present a simple runtime methodology for scaling irregular applications parallelized with the standard OpenMP interface. ...
Irregular parallel applications are a particularly challenging application domain for parallel programming models, since they require domain specific data distribution and load balancing algorithms. ...
Acknowledgments We are grateful to the ECMWF and Siegfried Benkner for providing us with the irregular kernels. ...
doi:10.1145/582034.582050
dblp:conf/sc/NikolopoulosPA01
fatcat:iq75fa4my5bsjbfe5kmx4fq2te
A Systematic Design Space Exploration Approach to Customising Multi-Processor Architectures: Exemplified Using Graphics Processors
[chapter]
2011
Lecture Notes in Computer Science
A systematic approach to customising Homogeneous Multi-Processor (HoMP) architectures is described. The approach involves a novel design space exploration tool and a parameterisable system model. ...
We also analyse on-chip and off-chip memory access for systems with one or more processing elements (PEs), and study the impact of the number of threads per PE on the amount of off-chip memory access and ...
A systematic design space methodology to explore the customisation options for a HoMP. The key feature is the notion of pre-and post-fab options (Section 4). 3. ...
doi:10.1007/978-3-642-24568-8_4
fatcat:7r43f3e5hjhdje5g7njmb2zczq
Array Size Computation under Uniform Overlapping and Irregular Accesses
2016
ACM Transactions on Design Automation of Electronic Systems
We propose a methodology to compute the minimum resources required for storing an array which keeps the exploration time low and provides a near-optimal result for regularly and non-regularly occurring ...
Otherwise their exploration time is increased with an increase over the number of the different accessed parts of the array. ...
In [Wuytack et al. 1998 ] a data access graph based on polytopes is used to describe all the memory operations in time for a given array, which is used as input to the data reuse exploration and decision ...
doi:10.1145/2818643
fatcat:olifuqxswjdcnb27sijdu73jpy
50 & 25 Years Ago
2020
Computer
(p. 27) : "Recently, researchers have focused on achieving the above goals through organizational changes and methodologies for systematic software reuse. ...
(p. 39) "Irregular computations: In many important applications, compile-time analysis is insufficient when communication patterns are data dependent and known only at runtime. ...
doi:10.1109/mc.2020.3010073
fatcat:gd35v3gdvjcv7g4xqpmyvlw7k4
Quantifying Data Locality in Dynamic Parallelism in GPUs
2018
Proceedings of the ACM on Measurement and Analysis of Computing Systems
We observe that, for DP applications, data reuse is highly irregular and is heavily dependent on the application and its input. ...
Thus, existing techniques cannot exploit data reuse efectively for DP applications. ...
ACKNOWLEDGMENT We thank Ganesh Ananthanarayanan for shepherding our paper. We also thank the anonymous reviewers for their constructive feedback. ...
doi:10.1145/3287318
fatcat:zmop6pak6jefve6jtypricmo2a
Characteristics of workloads used in high performance and technical computing
2007
Proceedings of the 21st annual international conference on Supercomputing - ICS '07
Since prefetching plays an important role in the performance of computational workloads, we explore the prefetching potential and for parallel workloads we study the sharing properties of memory accesses ...
We also analyze memory access patterns including various aspects of cache utilization and locality properties of address distributions. ...
Interesting data points and conclusions from this data include: 1) the lack of speedup for GTC, where memory access patterns are irregular and sufficiently complex that there is no effective software prefetching ...
doi:10.1145/1274971.1274984
dblp:conf/ics/CheveresanRFS07
fatcat:ptpam3kzxzcebp6jm3m3cahlaa
Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics
[article]
2020
arXiv
pre-print
Third, we show that the design dimensions explored here are inter-dependent, reinforcing the need for software-hardware co-design in the above design dimensions. ...
This work provides the first study to explore the interaction of update propagation with and without fine-grained synchronization (push vs. pull), emerging coherence protocols (GPU vs. ...
Each implementation represents a design specialization that can be made for the irregular graph workloads that we explore. ...
arXiv:2002.10245v2
fatcat:xth5zg7erbdadhte4zhelgopp4
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
[article]
2021
arXiv
pre-print
Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does ...
The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting ...
Further, a systematic methodology for mapping communication onto interconnect topology can enable design space exploration of interconnects needed for accelerating target ML models, allowing minimum overhead ...
arXiv:2007.00864v2
fatcat:k4o2xboh4vbudadfiriiwjp7uu
The knowledge circulated-organisational management for accomplishing e-learning
2009
Knowledge Management & E-Learning: An International Journal
This means "knowledge in universities circulated-systematic process" of finding, selecting, organising, distilling and presenting information in a way that improves a learner's competency and/or ability ...
In order to construct such educational management systems, the fundamental processing modules are required, such as a distributed file system, synchronous data communications, etc. ...
In order with the average values for three patterns: Progressive-pattern>Regressive-pattern >Spiral-pattern. ...
doi:10.34105/j.kmel.2009.01.002
fatcat:ebk47nqtorfvpbffyo3tbkikeu
Use of Computation-Unit Integrated Memories in High-Level Synthesis
2006
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Efficient data reuse of register files have also been fully exploited to further improve system performance. ...
This paper addresses the challenge of providing a systematic synthesis framework for a CIM-based architecture. ...
The issue of systematic data reuse for irregular access patterns is discussed in [29] . ...
doi:10.1109/tcad.2005.862749
fatcat:k5x37m5ilvbfbm27oziovprs2y
Understanding object-level memory access patterns across the spectrum
2017
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
Especially we thank our shepherd, Simon Hammond, for his guidance and responsiveness. This work was supported in part by the National Key ...
Also, the same application cannot be expected to have the same memory access pattern for di erent input problem sizes or input data. ...
Our systematic study of object-level access patterns (across 38 applications from diverse domains) supplements these existing approaches and provides practical implications for ongoing architectural, system ...
doi:10.1145/3126908.3126917
dblp:conf/sc/JiWEMKVXS17
fatcat:nqmd4px5hfawfhp6itubchgjr4
Data Service API Design for Data Analytics
[chapter]
2018
Lecture Notes in Computer Science
Last but not least, current data services do not support the reuse of data exploration processes and the data derived from data analysts. ...
Across the entire data analytics lifecycle, data service can be regarded as a method for data retrieval and exploration. ...
The discussion focuses on the data preparation for data analytics by analyzing the data retrieval patterns and data exploration features. ...
doi:10.1007/978-3-319-94376-3_6
fatcat:cg3pqi5tzbe6blkyg557aqif5q
A Portable Optimization Engine for Accelerating Irregular Data-Traversal Applications on SIMD Architectures
2014
ACM Transactions on Architecture and Code Optimization (TACO)
This article develops support for exploiting such data parallelism for a class of nonnumeric, nongraphic applications, which perform computations while traversing many independent, irregular data structures ...
To address this challenge, we develop a set of data layout optimizations that improve spatial locality for applications that traverse many irregular data structures. ...
We refer to this traversal pattern as sparse buckets accesses. ...
doi:10.1145/2632215
fatcat:qegvfixkezacxgkrg557sphsje
Generation of Heterogeneous Distributed Architectures for Memory-Intensive Applications Through High-Level Synthesis
2007
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
The high-level synthesis (HLS) techniques presented in this paper are motivated by the fact that many memory-intensive applications exhibit irregular array data access patterns. ...
Synthesis should, therefore, be capable of determining a partitioned architecture, wherein array data and computations may have to be heterogeneously distributed for achieving the best performance speed-up ...
The issue of systematic data reuse for irregular access patterns is discussed in [13] , which motivates us to explore data duplication effects among partitioned subsystems in our paper. ...
doi:10.1109/tvlsi.2007.904096
fatcat:czc256r4zfc7hbe44ir6smrqwu
« Previous
Showing results 1 — 15 out of 4,426 results