Filters








10 Hits in 8.6 sec

Scalable and Modular Scheduling [chapter]

Paul Feautrier
2004 Lecture Notes in Computer Science  
A schedule gives a blueprint for constructing a synchronous program, suitable for an ASIC or VLIW processor. However, constructing a schedule entails solving a large linear program.  ...  Scheduling a program (i.e. constructing a timetable for the execution of its operations) is one of the most powerful methods for automatic parallelization.  ...  This model fits well with the structure of VLIW processors or ASIC/FPGA special purpose circuits. An unspecified VLIW processor will be the main target architecture in what follows.  ... 
doi:10.1007/978-3-540-27776-7_45 fatcat:bwuhkb3f5nhidbkdnjq7albphq

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming

Harm Munk, Eduard Ayguadé, Cédric Bastoul, Paul Carpenter, Zbigniew Chamski, Albert Cohen, Marco Cornero, Philippe Dumont, Marc Duranton, Mohammed Fellahi, Roger Ferrer, Razya Ladelsky (+15 others)
2010 International journal of parallel programming  
However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming  ...  embedded architectures.  ...  Vectorization is known to be one of the most effective ways to exploit fine-grain data-level parallelism, and is especially important for streaming architectures because their processing units typically  ... 
doi:10.1007/s10766-010-0132-7 fatcat:dlvxlop65ngzfaezs3yjs2mfm4

Survey on Software Engineering for Scientific Applications: Reuseable Software, Grid Computing and Application

Dominik Jürgens, Universitätsbibliothek Braunschweig
2009
In this context, different simulation-programs model only a part of a more complex coupled system.  ...  The interdiciplinary nature of scientific software thereby presents new challanges for software engineering.  ...  Super-scalar architectures introduce parallelism not only for increasing clock rates.  ... 
doi:10.24355/dbbs.084-200905120200-3 fatcat:dvlwoiyr7bh5pchcoivorvxm3i

High-performance and hardware-aware computing: proceedings of the first International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC'08) [article]

Rainer Buchty, Jan-Philipp [Hrsg.] Weiß
2008
For efficient implementations and optimal results, underlying algorithms and mathematical solution methods have to be adapted carefully to architectural constraints like fine-grained parallelism and memory  ...  Different programming models, non-adjusted interfaces, and bandwidth bottlenecks complicate holistic programming approaches for heterogeneous architectures.  ...  The QS21 BladeCenter is let by courtesy of SVA System Vertrieb Alexander GmbH, Germany. All mentioned products and brand names are trademarks or registered trademarks of their respective owners.  ... 
doi:10.5445/ksp/1000009529 fatcat:zvpuywyjzfbabpcxl6o4klo7nu

Parallelization of Numerical Methods on Parallel Processor Architectures

László Endre, Szolgay Péter
2016
), Intel's MIC (Many Integrated Core) or FPGA (Field Programmable Gate Array) architecture.  ...  ) programming approach in OP2 is also discussed.  ...  languages are more fine-grained.  ... 
doi:10.15774/ppke.itk.2016.002 fatcat:bct4vyxe2zfdrewgrcjg4h63se

Design and evaluation of a network-based asynchronous architecture for cryptographic devices

L. Dilparic, D.K. Arvind
Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004.  
Chris Bainbridge for proofreading the thesis material and for their helpful comments. The Overseas Research Student (ORS) Award Scheme for covering the overseas tuition fees.  ...  To the Graduate School of Informatics for covering the home fees and partial maintenance. To the System Level Integration group for providing some of the maintenance funding.  ...  Acknowledgements I am deeply grateful to my husband, Joseph, for his love, patience and continuous support during the many difficult times of my PhD studies.  ... 
doi:10.1109/asap.2004.1342470 fatcat:uki6krxwnjbyvm5dovz6i52qc4

Exploiting tightly-coupled cores

Daniel Bates, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository
2014
Communication is usually slow compared to computation, and so restricts the opportunities for profitable parallelisation.  ...  In this dissertation I introduce Loki: a homogeneous, tiled architecture made up of many simple, tightly-coupled cores.  ...  This question of whether instructions should be sent to where the data is stored, or whether data should be sent to where the instructions are stored becomes increasingly relevant on fine-grained architectures  ... 
doi:10.17863/cam.16381 fatcat:6wvapjattzflhdjexbdf465fh4

Video Processing Acceleration using Reconfigurable Logic and Graphics Processors

Benjamin Thomas Cope, Peter Cheung, Wayne Luk, Donal Morphy
2009
The comparison results prompt the exploration of the customisable options for the graphics processor architecture.  ...  A positive result of the exploration is the proposal of a reconfigurable engine for data access (REDA) to optimise graphics processor performance for video processing-specific memory access patterns.  ...  QSilver [118] is another fine-grained graphics processor architectural model.  ... 
doi:10.25560/1412 fatcat:3z7gtznvondmvirk3hkrkg5fpy

Methodology for complex dataflow application development

Nils Voss, Wayne Luk, Georgi Gaydadjiev
2021
This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems.  ...  and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware.  ...  Similarly, the toolchain used for programming the target device has to provide enough fine-grained control over the hardware to the programmer, to allow for an accurate implementation of the developed  ... 
doi:10.25560/91314 fatcat:w6ru3zmzozetjf3rhgw5pgcbzu

A composable and predictable on-chip interconnect [article]

Hansson, MA (Andreas), Corporaal, H (Henk), Goossens, KGW (Kees)
2009
Compare, for example, the compile-time scheduling of a VLIW with that of a super-scalar out-of-order processor.  ...  Furthermore, unlike a super-scalar processor, the VLIW has no bypassing or hazard detection mechanisms.  ...  The first category contains the symbols and notations used for architectural constants. The second category is related to the input specification of application requirements.  ... 
doi:10.6100/ir642929 fatcat:rqwyl7jdcrac7cug2y77lq662m