A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Extending Modulo Scheduling with Memory Reference Merging
[chapter]
1999
Lecture Notes in Computer Science
We describe an extension of modulo scheduling, called "memory reference merging", which improves the management of cache bandwidth on microprocessors such as the DEC Alpha 21164. ...
The principle is to schedule together memory references that are likely to be merged in a read buffer (LOADs), or a write buffer (STOREs). ...
The modulo scheduler is then extended to assume pairwise conflicts on the CB resource for all mergeable memory references, except for those that are scheduled within one of the merge intervals associated ...
doi:10.1007/978-3-540-49051-7_19
fatcat:cpq6dedt7zasvbaknitzovp75y
An integrated and automated memory optimization flow for FPGA behavioral synthesis
2012
17th Asia and South Pacific Design Automation Conference
We develop memory padding to help in the memory partitioning of indices with modulo operations. ...
Moreover, memory merging saves up to 44.32% of block RAM (BRAM). ...
The rest of our extended 3B-4 partitioning approach for k modulo references is similar to that in [16] . ...
doi:10.1109/aspdac.2012.6164955
dblp:conf/aspdac/WangZCC12
fatcat:utp3igrbw5dvjd6lwsszjlhgxy
Profile-assisted instruction scheduling
1994
International journal of parallel programming
In this paper, two major categories of pro le information are studied: control-ow and memory-dependence. Pro le-assisted code scheduling techniques have been incorporated into the IMPACT-I compiler. ...
This paper describes the scheduling algorithms, highlights the modi cations required to use pro le information, and explains the hardware and compiler support for dealing with hazards that arise from aggressive ...
Scheduling with Memory-dependence Pro le Information The freedom of global instruction scheduling is limited by memory dependences. ...
doi:10.1007/bf02577873
fatcat:yd7nxg2lbrctlgv5wk2zyrfmz4
Code-size conscious pipelining of imperfectly nested loops
2007
Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture - MEDEA '07
In addition to preserving precious scratch-pad or cache memory, our method also avoids the performance overhead of prologs and epilogs resulting from pipelined inner loops with short trip count. ...
This paper is a step towards enabling multidimensional software pipelining of non-perfectly nested loops on memory-constrained architectures. ...
Technically, we propose to modulo-schedule [23] as many phases as possible, while merging the prolog of each outer iteration of a phase with the epilog of its previous outer iteration. ...
doi:10.1145/1327171.1327177
fatcat:ur5bvyxcwbfczk7re3vvrhilvm
Programming support for reconfigurable custom vector architectures
2015
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15
We present a domain specific language (DSL) for the programming part, and a constraint programming approach to scheduling with memory allocation. ...
without memory allocation. ...
Scheduling one iteration With the model explained so far, we have scheduled a QRD with memory allocation. ...
doi:10.1145/2712386.2712399
dblp:conf/ppopp/ArslanKGL15
fatcat:e3mrftzy7vhz5bfdfpz3xf5a4a
Graph minor approach for application mapping on CGRAs
2012
2012 International Conference on Field-Programmable Technology
Compute-intensive loop kernels are mapped to CGRA through modified modulo scheduling algorithms that integrate placement and routing. ...
We transform the CGRA mapping problem with route sharing into a graph minor problem. Our graph minor formalization provides a solid foundation for application mapping on CGRA. ...
In the following, the term MRRG will be used to refer to MRRG with wrap around edges.
C. Register File Modeling Our mapping technique integrates register allocation with scheduling. ...
doi:10.1109/fpt.2012.6412149
dblp:conf/fpt/ChenM12
fatcat:qafjblmlurc3xbrwuzwubqlrrm
Software synthesis from the dataflow interchange format
2005
Proceedings of the 2005 workshop on Software and compilers for embedded systems - SCOPES '05
This framework allows designers to efficiently explore the complex range of implementation trade-offs that are available through various dataflow-based techniques for scheduling and memory management. ...
The dataflow interchange format (DIF) [11] and the associated DIF package have been developed for specifying, working with, and transferring dataflow-based DSP designs across tools. ...
performing modulo operations. ...
doi:10.1145/1140389.1140394
dblp:conf/scopes/HsuB05
fatcat:7m4rr2bdwjgk7brhgtdwg4ipiu
Trident: From High-Level Language to Hardware Circuitry
2007
Computer
In this mode, Carte optimizes for parallelism by pipelining loops, scheduling memory references, and supporting parallel code blocks and streams. ...
Trident's modulo scheduling algorithm also schedules reads and writes to packed arrays in the same time slot. ...
doi:10.1109/mc.2007.107
fatcat:yamhfranjrfurfci54l2mnaafe
Contributions to the GNU Compiler Collection
2005
IBM Systems Journal
We also cover many optimizations, including the interblock instruction scheduler, software pipeliner, and vectorizer. ...
This paper includes a report on our general experience with GCC in both open source and proprietary software environments and reviews the quality and performance of GCCgenerated code. ...
the section ''Software pipelining and modulo scheduling''). ...
doi:10.1147/sj.442.0259
fatcat:pk6kpumpxzchzii25hcivisle4
Hierarchical coarse-grained stream compilation for software defined radio
2007
Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems - CASES '07
Because of the streaming nature of SDR protocols, we adapted an existing instruction-level software pipelining technique, modulo scheduling, for coarse-grained compilation. ...
We then present a coarse-grained dataflow compilation strategy that assigns a SDR protocol's DSP kernels onto multiple processors, allocates memory buffers, and determines an execution schedule that meets ...
[20] extends the modulo scheduling to software pipeline any loop nest in a multi-dimensional loop, which conceptually is similar to coarse-grained modulo scheduling. ...
doi:10.1145/1289881.1289903
dblp:conf/cases/LinKMM07
fatcat:n2rkbxygjfaiblioqg5nwwqqua
Automating the Design of Processor/Accelerator Embedded Systems with LegUp High-Level Synthesis
2014
2014 12th IEEE International Conference on Embedded and Ubiquitous Computing
The hybrid system comprises an embedded processor and custom accelerators that realize user-designated compute-intensive parts of the program with improved throughput and energy efficiency. ...
COMPARISON AGAINST LEGUP 3.0 We compared the quality of results provided by the current LegUp tool, with that provided by the 3.0 release (early 2013). ...
in modulo scheduling. ...
doi:10.1109/euc.2014.26
dblp:conf/euc/FortCCCLHCHSCBA14
fatcat:gm6y5hryjjghjdfws74y7jzody
Reducing Interconnect Complexity for Efficient Path Metric Memory Management in Viterbi Decoders
2008
IEICE transactions on information and systems
Using the derived equations for memory partition and add-compare-select (ACS) arrangement together with the extended in-place scheduling scheme proposed in this work, we can increase the memory bandwidth ...
in-place scheduling, path metric memory management, VLSI architecture ...
References [1] ...
doi:10.1093/ietisy/e91-d.9.2300
fatcat:hwkecrvdnberlngrhkjsqcm3v4
Merge or Separate?
2017
Proceedings of the General Purpose GPUs on - GPGPU-10
We use a machine learning-based predictive model at runtime to detect whether to merge OpenCL kernels or schedule them separately to the most appropriate devices without the need for ahead-of-time profiling ...
Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators. ...
Impact by memory accessing intensity. It may be the case that sharing low computation intensity kernels can be extended to memory intense kernels. ...
doi:10.1145/3038228.3038235
dblp:conf/ppopp/WenO17
fatcat:sdz7a54435hjjcgmqv4qetbplm
Code generation schema for modulo scheduled loops
1992
ACM SIGMICRO Newsletter
This issue is studied both with and without hardware features that are specifically aimed at supporting modulo scheduling. ...
Modulo scheduling is one approach for generating such schedules. ...
For both P3 and P4, six memory operations were scheduled onto two memory units. ...
doi:10.1145/144965.145795
fatcat:45zdocdtkfh3nnm7gbbu5lte3y
Instruction-level parallel processing: History, overview, and perspective
1993
Journal of Supercomputing
instruction-level parallelism, VLIW processors, superscalar processors, pipelining, multiple operation issue, speculative execution, scheduling, register allocation Instruction-level Parallelism CILP) ...
An issue of central importance to all ILP compilation is the disambiguation of memory references., i.e., deciding whether two memory references definitely are to the same memory location or definitely ...
Percolation scheduling was then extended to non-unit execution latencies (but still with unbounded resources) [148] . ...
doi:10.1007/bf01205181
fatcat:v7uhz4km5ndxzhr7baybks2bn4
« Previous
Showing results 1 — 15 out of 2,149 results