Filters








91,473 Hits in 4.8 sec

On the Optimality of Register Saturation

Sid-Ahmed-Ali Touati
2005 Electronical Notes in Theoretical Computer Science  
Second, we prove that the problem of reducing the register saturation is NPhard. Our detailed experiments in this paper show that our previous heuristics [14] are nearly optimal.  ...  However, in a previous work [14], we introduced and mathematically studied the register saturation (RS) concept.  ...  Optimal Register Saturation Reduction In the case where the register saturation RS t (G) exceeds the number of available registers R t of the type t, then we must add extra serial arcs into the DAG G to  ... 
doi:10.1016/j.entcs.2005.01.033 fatcat:qnuc3x3xgrbvlhyknxlolleed4

On the optimality of register saturation

S.-A.-A. Touati
Workshops on Mobile and Wireless Networking/High Performance Scientific, Engineering Computing/Network Design and Architecture/Optical Networks Control and Management/Ad Hoc and Sensor Networks/Compile and Run Time Techniques for Parallel Computing ICPP 2004  
Second, we prove that the problem of reducing the register saturation is NPhard. Our detailed experiments in this paper show that our previous heuristics [14] are nearly optimal.  ...  However, in a previous work [14], we introduced and mathematically studied the register saturation (RS) concept.  ...  Optimal Register Saturation Reduction In the case where the register saturation RS t (G) exceeds the number of available registers R t of the type t, then we must add extra serial arcs into the DAG G to  ... 
doi:10.1109/icppw.2004.1328069 dblp:conf/icppw/Touati04 fatcat:jozox4v6k5bu5n4wyy4hgbabje

Register Saturation in Superscalar and VLIW Codes [chapter]

Sid Ahmed Ali Touati
2001 Lecture Notes in Computer Science  
In this work, we mathematically study and extend the approach which consists of computing the exact upper-bound of the register need for all the valid schedules, independently of the functional unit constraints  ...  Its aim was to add some serial arcs to the original DAG such that the worst register need does not exceed the number of available registers.  ...  As consequence, our heuristic does not compute an upper bound of the optimal register saturation and then the optimal RS can be greater than the one computed by Greedy-k.  ... 
doi:10.1007/3-540-45306-7_15 fatcat:xbrndfdvbfclhctgrs2ykq5bxq

Periodic register saturation in innermost loops

Sid-Ahmed-Ali Touati, Zsolt Mathe
2009 Parallel Computing  
We call this upper-limit the periodic register saturation (PRS) of the data dependence graph (DDG).  ...  It extends the register saturation (RS) concept to periodic instruction schedules, i.e., software pipelining (SWP).  ...  This research result would not succeed without the valuable support of the University of Versailles Saint-Quentin en Yvelines, INRIA-Rocquencourt and INRIA-Saclay in France.  ... 
doi:10.1016/j.parco.2008.12.001 fatcat:brbfjekj4jgdxksqcvkhc7h7vi

Optimal speech codec implementation on ARM9E (v5E architecture) RISC processor for next-generation mobile multimedia

Ajay Kumar Bangla, M. K. Vinay, P. V. Suresh Babu, Sethuraman Panchanathan, Bhaskaran Vasudev
2004 Visual Communications and Image Processing 2004  
Our optimization techniques are based on identification of algorithms, which could exploit either the DSP features or the RISC features or both.  ...  By a systematic application of these optimization techniques for a GSM-AMR (NB) codec 1 on ARM9E core 2 , we could achieve more than 77% improvement over the baseline codec and almost 33% (worst-case)  ...  One saturation block performs a double and saturate, required for fractional MAC (Q15 x Q15 + Q31→Q31); the other performs a straight saturation of the accumulated value.  ... 
doi:10.1117/12.532455 dblp:conf/vcip/BanglaVB04 fatcat:fqvxhjmbyzhrhe6567tnduvyqq

Register Saturation in Instruction Level Parallelism

Sid-Ahmed-Ali Touati
2005 International journal of parallel programming  
Our deeper analysis of the problem and our formal methods enable us to provide nearly optimal heuristics and strategies for register optimization in the face of ILP.  ...  We call this computed limit the register saturation (RS) of the DAG. Its aim is to detect possible obsolete register constraints, i.e., when RS does not exceed the number of available registers.  ...  Consequently, our heuristics do not compute an upper bound of the optimal register saturation, and the optimal RS can be greater than the one computed by Greedy-k.  ... 
doi:10.1007/s10766-005-6466-x fatcat:zbtnwdgvp5fbtfs5rqjoxooo2a

A Low-Power Multithreaded Processor for Software Defined Radio

Michael Schulte, John Glossner, Sanjay Jinturkar, Mayan Moudgill, Suman Mamidi, Stamatis Vassiliadis
2006 Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology  
Using a super-computer class vectorizing compiler, the SB3010 achieves real-time performance in software on a variety of communication protocols including 802.11b, GPS, AM/FM radio, Bluetooth, GPRS, and  ...  We also describe the processor's programming environment and the SB3010 platform, a complete system-on-chip solution for software defined radio.  ...  To reduce the number of ports, the VRF uses a novel technique, which divides it into two register banks; one for even threads and one for odd threads.  ... 
doi:10.1007/s11265-006-7267-1 fatcat:nfqhlyks6bhmnfyg2opfpsm4xy

Effective compiler generation by architecture description

Stefan Farfeleder, Andreas Krall, Edwin Steiner, Florian Brandner
2006 Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers and tool support for embedded systems - LCTES '06  
From a specification, we can derive an optimized tree pattern matching instruction selector, a register allocator and an instruction scheduler.  ...  Architecture description languages (ADLs) provide a single concise architecture specification for the generation of hardware, instruction set simulators and compilers.  ...  Acknowledgments This work is supported in part by Infineon Technologies Austria and the Christian Doppler Forschungsgesellschaft. We like to thank  ... 
doi:10.1145/1134650.1134671 dblp:conf/lctrts/FarfelederKSB06 fatcat:a57pdprmkbeozk4jfmwzbxll4q

Effective compiler generation by architecture description

Stefan Farfeleder, Andreas Krall, Edwin Steiner, Florian Brandner
2006 SIGPLAN notices  
From a specification, we can derive an optimized tree pattern matching instruction selector, a register allocator and an instruction scheduler.  ...  Architecture description languages (ADLs) provide a single concise architecture specification for the generation of hardware, instruction set simulators and compilers.  ...  Acknowledgments This work is supported in part by Infineon Technologies Austria and the Christian Doppler Forschungsgesellschaft. We like to thank  ... 
doi:10.1145/1159974.1134671 fatcat:yanz6oia2fhdtm25l32fokvuba

Analysis of Execution Efficiency in the Microthreaded Processor UTLEON3 [chapter]

Jaroslav Sykora, Leos Kafka, Martin Danek, Lukas Kohout
2011 Lecture Notes in Computer Science  
As the compiler specifies the blocksize parameter for each family of threads individually, it can optimize the register file utilization of the processor.  ...  We analyse an impact of long-latency instructions, the family blocksize parameter, and the thread switch modifier on execution efficiency of families of threads in a single-core configuration of the UTLEON3  ...  The paper reflects only the authors' view; neither the European Commission nor the Czech Ministry of Education are liable for any use that may be made of the information contained herein.  ... 
doi:10.1007/978-3-642-19137-4_10 fatcat:iwxumipjabaxre2qv7yznnhu44

Universality and Optimality of Programmable Quantum Processors

Mário Ziman, Vladimír Bužek
2006 Acta Physica Hungarica A: Heavy Ion Physics  
We define several characteristics how to quantify the optimality and we study in detail performance of three types of programmable quantum processors based on (1) the C-NOT gate, (2) the SWAP operation  ...  We also investigate optimality of the so-called U-processors and we also compare the optimal approximative implementation of U(1) qubit rotations with the known probabilistic implementation as introduced  ...  In order to realize n unitary transformation of the data register one must use n dimensional program register.  ... 
doi:10.1556/aph.26.2006.3-4.8 fatcat:ntnf4erezndt3elhba23r4kwem

PLX: An Instruction Set Architecture and Testbed for Multimedia Information Processing

Ruby B. Lee, A. Murat Fiskiran
2005 Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology  
We demonstrate the use and high performance of PLX on some frequently-used code kernels selected from image, video, and graphics processing applications: discrete cosine transform, pixel padding, clip  ...  Another design goal of PLX is to facilitate exploration and evaluation of novel techniques in instruction set architecture, microarchitecture, arithmetic, VLSI implementations, compiler optimizations,  ...  Acknowledgments PLX is a project of the Princeton Architecture Laboratory for Multimedia and Security (PALMS).  ... 
doi:10.1007/s11265-005-4940-8 fatcat:vzndq4zbfvdt7cey2yncygiuoi

Loner: utilizing the CPU vector datapath to process scalar integer data

Armand Behroozi, Sunghyun Park, Scott Mahlke
2022 Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction  
In this paper, we present Loner, a profile-guided compiler methodology for optimizing scalar integer loops using the otherwise idle vector datapath.  ...  Thus, CPU vector registers and functional units frequently sit idle while the scalar datapath unilaterally executes code.  ...  Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation.  ... 
doi:10.1145/3497776.3517767 fatcat:2cvuymu7tjemldf3bftrtyhpyi

The Evaluation of Traffic Control in Changsha City

Shoufeng Lu, Jie Li, Henk van Zuylen
2012 Procedia - Social and Behavioral Sciences  
The second issue is the low saturation flow observed on the intersections, that appear to be 20 to 30% lower than the ones in comparable situations in Europe or North America.  ...  Lastly, the signal timing of a 13nodes network in the CBD of Changsha has been optimized with TRANSYT-14.  ...  The disobedience of drivers has been registered and it appears indeed that the number of such actions has a relationship with the traffic performance, i.e. the more disobedience occurs on an intersection  ... 
doi:10.1016/j.sbspro.2012.04.094 fatcat:6ko7siqggngqbeubw3764ixz44

Optimal Pulsing Schemes for Galileo Pseudolite Signals

T.L. Abt, F. Soualle, S. Martin
2007 Journal of Global Positioning Systems  
Basically these studies have been focused on the GPS pseudolites and the proposed pulsing schemes are optimised for the GPS signals (RTCM, RTCA).  ...  Simulations based on the Galileo signal structure (codes, chipping rates, cross correlation properties) have been performed and the results will be presented.  ...  the spacing is defined as the optimal one.  ... 
doi:10.5081/jgps.6.2.133 fatcat:7zmf54htpffdbl2bpue2yv5szu
« Previous Showing results 1 — 15 out of 91,473 results