Filters








2,149 Hits in 3.1 sec

Extending Modulo Scheduling with Memory Reference Merging [chapter]

Benoît Dupont de Dinechin
1999 Lecture Notes in Computer Science  
We describe an extension of modulo scheduling, called "memory reference merging", which improves the management of cache bandwidth on microprocessors such as the DEC Alpha 21164.  ...  The principle is to schedule together memory references that are likely to be merged in a read buffer (LOADs), or a write buffer (STOREs).  ...  The modulo scheduler is then extended to assume pairwise conflicts on the CB resource for all mergeable memory references, except for those that are scheduled within one of the merge intervals associated  ... 
doi:10.1007/978-3-540-49051-7_19 fatcat:cpq6dedt7zasvbaknitzovp75y

An integrated and automated memory optimization flow for FPGA behavioral synthesis

Yuxin Wang, Peng Zhang, Xu Cheng, Jason Cong
2012 17th Asia and South Pacific Design Automation Conference  
We develop memory padding to help in the memory partitioning of indices with modulo operations.  ...  Moreover, memory merging saves up to 44.32% of block RAM (BRAM).  ...  The rest of our extended 3B-4 partitioning approach for k modulo references is similar to that in [16] .  ... 
doi:10.1109/aspdac.2012.6164955 dblp:conf/aspdac/WangZCC12 fatcat:utp3igrbw5dvjd6lwsszjlhgxy

Profile-assisted instruction scheduling

William Y. Chen, Scott A. Mahlke, Nancy J. Warter, Sadun Anik, Wen-Mei W. Hwu
1994 International journal of parallel programming  
In this paper, two major categories of pro le information are studied: control-ow and memory-dependence. Pro le-assisted code scheduling techniques have been incorporated into the IMPACT-I compiler.  ...  This paper describes the scheduling algorithms, highlights the modi cations required to use pro le information, and explains the hardware and compiler support for dealing with hazards that arise from aggressive  ...  Scheduling with Memory-dependence Pro le Information The freedom of global instruction scheduling is limited by memory dependences.  ... 
doi:10.1007/bf02577873 fatcat:yd7nxg2lbrctlgv5wk2zyrfmz4

Code-size conscious pipelining of imperfectly nested loops

Mohammed Fellahi, Albert Cohen, Sid Touati
2007 Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture - MEDEA '07  
In addition to preserving precious scratch-pad or cache memory, our method also avoids the performance overhead of prologs and epilogs resulting from pipelined inner loops with short trip count.  ...  This paper is a step towards enabling multidimensional software pipelining of non-perfectly nested loops on memory-constrained architectures.  ...  Technically, we propose to modulo-schedule [23] as many phases as possible, while merging the prolog of each outer iteration of a phase with the epilog of its previous outer iteration.  ... 
doi:10.1145/1327171.1327177 fatcat:ur5bvyxcwbfczk7re3vvrhilvm

Programming support for reconfigurable custom vector architectures

Mehmet Ali Arslan, Krzysztof Kuchcinski, Flavius Gruian, Yangxurui Liu
2015 Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15  
We present a domain specific language (DSL) for the programming part, and a constraint programming approach to scheduling with memory allocation.  ...  without memory allocation.  ...  Scheduling one iteration With the model explained so far, we have scheduled a QRD with memory allocation.  ... 
doi:10.1145/2712386.2712399 dblp:conf/ppopp/ArslanKGL15 fatcat:e3mrftzy7vhz5bfdfpz3xf5a4a

Graph minor approach for application mapping on CGRAs

Liang Chen, Tulika Mitra
2012 2012 International Conference on Field-Programmable Technology  
Compute-intensive loop kernels are mapped to CGRA through modified modulo scheduling algorithms that integrate placement and routing.  ...  We transform the CGRA mapping problem with route sharing into a graph minor problem. Our graph minor formalization provides a solid foundation for application mapping on CGRA.  ...  In the following, the term MRRG will be used to refer to MRRG with wrap around edges. C. Register File Modeling Our mapping technique integrates register allocation with scheduling.  ... 
doi:10.1109/fpt.2012.6412149 dblp:conf/fpt/ChenM12 fatcat:qafjblmlurc3xbrwuzwubqlrrm

Software synthesis from the dataflow interchange format

Chia-Jui Hsu, Ming-Yung Ko, Shuvra S. Bhattacharyya
2005 Proceedings of the 2005 workshop on Software and compilers for embedded systems - SCOPES '05  
This framework allows designers to efficiently explore the complex range of implementation trade-offs that are available through various dataflow-based techniques for scheduling and memory management.  ...  The dataflow interchange format (DIF) [11] and the associated DIF package have been developed for specifying, working with, and transferring dataflow-based DSP designs across tools.  ...  performing modulo operations.  ... 
doi:10.1145/1140389.1140394 dblp:conf/scopes/HsuB05 fatcat:7m4rr2bdwjgk7brhgtdwg4ipiu

Trident: From High-Level Language to Hardware Circuitry

Justin L. Tripp, Maya B. Gokhale, Kristopher D. Peterson
2007 Computer  
In this mode, Carte optimizes for parallelism by pipelining loops, scheduling memory references, and supporting parallel code blocks and streams.  ...  Trident's modulo scheduling algorithm also schedules reads and writes to packed arrays in the same time slot.  ... 
doi:10.1109/mc.2007.107 fatcat:yamhfranjrfurfci54l2mnaafe

Contributions to the GNU Compiler Collection

D. Edelsohn, W. Gellerich, M. Hagog, D. Naishlos, M. Namolaru, E. Pasch, H. Penner, U. Weigand, A. Zaks
2005 IBM Systems Journal  
We also cover many optimizations, including the interblock instruction scheduler, software pipeliner, and vectorizer.  ...  This paper includes a report on our general experience with GCC in both open source and proprietary software environments and reviews the quality and performance of GCCgenerated code.  ...  the section ''Software pipelining and modulo scheduling'').  ... 
doi:10.1147/sj.442.0259 fatcat:pk6kpumpxzchzii25hcivisle4

Hierarchical coarse-grained stream compilation for software defined radio

Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge
2007 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems - CASES '07  
Because of the streaming nature of SDR protocols, we adapted an existing instruction-level software pipelining technique, modulo scheduling, for coarse-grained compilation.  ...  We then present a coarse-grained dataflow compilation strategy that assigns a SDR protocol's DSP kernels onto multiple processors, allocates memory buffers, and determines an execution schedule that meets  ...  [20] extends the modulo scheduling to software pipeline any loop nest in a multi-dimensional loop, which conceptually is similar to coarse-grained modulo scheduling.  ... 
doi:10.1145/1289881.1289903 dblp:conf/cases/LinKMM07 fatcat:n2rkbxygjfaiblioqg5nwwqqua

Automating the Design of Processor/Accelerator Embedded Systems with LegUp High-Level Synthesis

Blair Fort, Andrew Canis, Jongsok Choi, Nazanin Calagar, Ruolong Lian, Stefan Hadjis, Yu Ting Chen, Mathew Hall, Bain Syrowik, Tomasz Czajkowski, Stephen Brown, Jason Anderson
2014 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing  
The hybrid system comprises an embedded processor and custom accelerators that realize user-designated compute-intensive parts of the program with improved throughput and energy efficiency.  ...  COMPARISON AGAINST LEGUP 3.0 We compared the quality of results provided by the current LegUp tool, with that provided by the 3.0 release (early 2013).  ...  in modulo scheduling.  ... 
doi:10.1109/euc.2014.26 dblp:conf/euc/FortCCCLHCHSCBA14 fatcat:gm6y5hryjjghjdfws74y7jzody

Reducing Interconnect Complexity for Efficient Path Metric Memory Management in Viterbi Decoders

M.-D. SHIEH, T.-P. WANG, C.-M. WU
2008 IEICE transactions on information and systems  
Using the derived equations for memory partition and add-compare-select (ACS) arrangement together with the extended in-place scheduling scheme proposed in this work, we can increase the memory bandwidth  ...  in-place scheduling, path metric memory management, VLSI architecture  ...  References [1]  ... 
doi:10.1093/ietisy/e91-d.9.2300 fatcat:hwkecrvdnberlngrhkjsqcm3v4

Merge or Separate?

Yuan Wen, Michael F.P. O'Boyle
2017 Proceedings of the General Purpose GPUs on - GPGPU-10  
We use a machine learning-based predictive model at runtime to detect whether to merge OpenCL kernels or schedule them separately to the most appropriate devices without the need for ahead-of-time profiling  ...  Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators.  ...  Impact by memory accessing intensity. It may be the case that sharing low computation intensity kernels can be extended to memory intense kernels.  ... 
doi:10.1145/3038228.3038235 dblp:conf/ppopp/WenO17 fatcat:sdz7a54435hjjcgmqv4qetbplm

Code generation schema for modulo scheduled loops

B. Ramakrishna Rau, Michael S. Schlansker, P. P. Tirumalai
1992 ACM SIGMICRO Newsletter  
This issue is studied both with and without hardware features that are specifically aimed at supporting modulo scheduling.  ...  Modulo scheduling is one approach for generating such schedules.  ...  For both P3 and P4, six memory operations were scheduled onto two memory units.  ... 
doi:10.1145/144965.145795 fatcat:45zdocdtkfh3nnm7gbbu5lte3y

Instruction-level parallel processing: History, overview, and perspective

B. Ramakrishna Rau, Joseph A. Fisher
1993 Journal of Supercomputing  
instruction-level parallelism, VLIW processors, superscalar processors, pipelining, multiple operation issue, speculative execution, scheduling, register allocation Instruction-level Parallelism CILP)  ...  An issue of central importance to all ILP compilation is the disambiguation of memory references., i.e., deciding whether two memory references definitely are to the same memory location or definitely  ...  Percolation scheduling was then extended to non-unit execution latencies (but still with unbounded resources) [148] .  ... 
doi:10.1007/bf01205181 fatcat:v7uhz4km5ndxzhr7baybks2bn4
« Previous Showing results 1 — 15 out of 2,149 results