261 Hits in 7.0 sec

A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation

Nicholas J. Fraser, Duncan J.M. Moss, JunKyu Lee, Stephen Tridgell, Craig T. Jin, Philip H.W. Leong
2015 2015 25th International Conference on Field Programmable Logic and Applications (FPL)  
In this paper, we propose the first fully pipelined floating point implementation of the kernel normalised least mean squares algorithm for regression.  ...  software implementation on a desktop processor.  ...  In this paper, we describe a particularly efficient implementation of the kernel normalised least mean squares (KNLMS) algorithm.  ... 
doi:10.1109/fpl.2015.7293952 dblp:conf/fpl/FraserMLTJL15 fatcat:tr4g4mfgwzhydckeupksctxcae

Study of heterogeneous and reconfigurable architectures in the communication domain

H. T. Feldkaemper, H. Blume, T. G. Noll
2003 Advances in Radio Science  
A factor of about seven orders of magnitude spans between a physically optimised implementation and an implementation on a programmable DSP kernel.  ...  An implementation on an embedded FPGA kernel is in between these two representing an attractive compromise with high flexibility and low power consumption.  ...  Fig. 6 . 6 Normalised energy conversion (E norm ) per symbol of Viterbi decoder implementations Fig. 7 . 7 Comparison by means of normalised costs, (a) Comparison of a Viterbi decoder, (b) Comparison  ... 
doi:10.5194/ars-1-165-2003 fatcat:wqaohyjkuzaibkbq443ixry4lm

Accelerating SuperBE with Hardware/Software Co-Design

Andrew Chen, Rohaan Gupta, Anton Borzenko, Kevin Wang, Morteza Biglari-Abhari
2018 Journal of Imaging  
fabric that assist a software processor.  ...  of accuracy.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/jimaging4100122 fatcat:2epjz74ttfh3dm2o4vfftbak7m

Development of a fine-grained parallel Karhunen–Loève transform

M. Fleury, R.P. Self, A.C. Downton
2004 Journal of Parallel and Distributed Computing  
Performance estimates suggest that the design will outperform implementation on a high-end microprocessor, given due attention to I/O (Input/Output).  ...  Detailed analysis of the design steps taken to produce a successful prototype are given. A design that addresses the issue of data bandwidth is included.  ...  Amongst the features relevant to the computation of a KLT are: • The KLT transform achieves optimal data compression in the mean-square error sense.  ... 
doi:10.1016/j.jpdc.2004.03.003 fatcat:fgw3h2iocrbyplr2zzqkuvfi2m

A Run-Time Adaptive FPGA Architecture for Monte Carlo Simulations

Xiang Tian, Christos-Savvas Bouganis
2011 2011 21st International Conference on Field Programmable Logic and Applications  
Unlike other state-of-the-art computing platforms, such as General Purpose Processors (GPPs) and General Purpose Graphics Processing Units (GPGPU), FPGAs can moreover exploit the applications' requirements  ...  The results demonstrate that an average of ∼1.35x throughput per resource unit improvement is achieved compared to conventional parallel arithmetic implementation.  ...  As it is expected, the 32 bits implementation gives the smoothest EDF, and the 16 bits implementation has the largest distance to the EDF of 32 bits implementation.  ... 
doi:10.1109/fpl.2011.30 dblp:conf/fpl/TianB11 fatcat:ntos56d22fb6redf3t2ueaxbba

Analysis of the Practical Implementation of Flicker Measurement Coprocessor for AMI Meters

Krzysztof Kołek, Andrzej Firlit, Krzysztof Piątek, Krzysztof Chmielowiec
2021 Energies  
This paper considers the implementation of the flicker measurement as an FPGA module to offload the processor subsystem or operate as an IP core in FPGA-based system-on-chip units.  ...  In state-of-the-art PQ measuring devices, the flicker measurement channel is usually implemented as a dedicated processor subsystem.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/en14061589 fatcat:n5suhlt3qzfr5cewpcy2xkcee4

Multi-level Customisation Framework for Curve Based Monte Carlo Financial Simulations [chapter]

Qiwei Jin, Diwei Dong, Anson H. T. Tse, Gary C. T. Chow, David B. Thomas, Wayne Luk, Stephen Weston
2012 Lecture Notes in Computer Science  
Designs targeting a Virtex-6 SX475T FPGA generated by our framework are about 40 times faster than single-core software implementations on an i7-870 quad-core CPU at 2.93 GHz; they are over 10 times faster  ...  and 20 times more energy efficient than 4-core implementations on the same i7-870 quad-core CPU, and are over three times more energy efficient and 36% faster than a highly optimised implementation on  ...  processor has been reported for speeding up the Brace, Gatarek and Musiela (BGM) interest rate model for derivatives evaluation [20] ; an American option valuator using least-squares Monte Carlo method  ... 
doi:10.1007/978-3-642-28365-9_16 fatcat:2rtzraq7ijcerogrk2czeia7sq

A Full Featured Configurable Accelerator for Object Detection with YOLO

Daniel Pestana, Pedro R. Miranda, Joao D. Lopes, Rui P. Duarte, Mario Vestias, Horacio C. Neto, Jose T. De Sousa
2021 IEEE Access  
It considers a fixed-point format, linearised activation functions, batch-normalisation, folding, and a hardware structure that exploits most of the available parallelism in CNN processing.  ...  The proposed core is configured for real-time execution of YOLOv3-Tiny and YOLOv4-Tiny, integrated into a RISC-V-based system-on-chip architecture and prototyped in an UltraScale XCKU040 FPGA (Field Programmable  ...  Batch-normalisation layers are used for speeding up the training by normalising the input data, that is, zero mean and unit standard deviation [20] .  ... 
doi:10.1109/access.2021.3081818 fatcat:n4rj7fvxlremtlxcng5wl5i43e

Reconfigurable FPGA-based switching path frequency-domain echo canceller with applications to voice control device

Ka Fai Cedric Yiu, Yao Lu, Chun Hok Ho, Wayne Luk, Jiaquan Huo, Sven Nordholm
2012 Digital signal processing (Print)  
processor.  ...  in the FPGA fabric surrounding a PowerPC on a Xilinx XUP V2P platform.  ...  Acknowledgments This paper is supported by the Research Grants Council of HK-SAR (PolyU 7191/06E) and the research committee of the Hong Kong Polytechnic University.  ... 
doi:10.1016/j.dsp.2011.10.008 fatcat:d3p3c2jvhfaetp5la25sdkyujm

Efficient Reconfigurable Architecture for Pricing Exotic Options

Pieter Fabry, David Thomas
2017 ACM Transactions on Reconfigurable Technology and Systems  
The combination of a highly parallelisable architecture and model-specific optimisations means that the binomial pricing technique allows for a 50× improvement in throughput compared to existing FPGA approaches  ...  Analysis of the binomial simulation model shows that only limitedprecision fixed point arithmetic is needed, and also shows that pairs of MC kernels are able to share RAM resources.  ...  The x-scales for these graphs are the product of K and T, where the Mean Square Error presented is the smallest error found via Monte Carlo simulation a given KT size (IE the mean square error of the best  ... 
doi:10.1145/3158228 fatcat:mvh43rpk5rf45o73xvp6fa2uru

KOCL: Power Self- Awareness for Arbitrary FPGA-SoC-Accelerated OpenCL Applications

James J. Davis, Joshua M. Levine, Edward A. Stott, Eddie Hung, Peter Y. K. Cheung, George A. Constantinides
2017 IEEE design & test  
This article introduces KOCL: a tool allowing OpenCL developers targetting FPGA-SoC devices to query live kernel-level power consumption using function calls embedded in their host code.  ...  For energy optimisation, such control decisions require knowledge of power usage at subsystem granularity.  ...  Technologies and the Royal Academy of Engineering.  ... 
doi:10.1109/mdat.2017.2750909 fatcat:elvgm64o3bao3ojerpsqiucyju

Domain-Specific Hybrid FPGA: Architecture and Floating Point Applications

Chun Hok Ho, Chi Wai Yu, Philip H.W. Leong, Wayne Luk, Steven J.E. Wilton
2007 2007 International Conference on Field Programmable Logic and Applications  
used for implementing datapaths; the precise amount of each type of resources can be customised to suit specific application domains.  ...  This paper presents a novel architecture for domain-specific FPGA devices.  ...  Acknowledgements The authors gratefully acknowledge the support of the UK EPSRC (grant EP/C549481/1 and grant EP/D060567/1).  ... 
doi:10.1109/fpl.2007.4380647 dblp:conf/fpl/HoYLLW07 fatcat:r3bmzlcugfcz5dgd66fkyj2yv4

Implementation of comprehensive address generator for digital signal processor

Ramesh M. Kini, Sumam S. David
2013 International journal of electronics (Print)  
This article focuses on the design and application-specific integrated circuit implementation of address generators for complex addressing modes required by multimedia signal-processing kernels.  ...  The performance of signal-processing algorithms implemented in hardware depends on the efficiency of datapath, memory speed and address computation.  ...  Acknowledgements The development, fabrication and testing of the chip was supported by the Ministry of Communication and Information Technology, Government of India, under Special Man-power Development  ... 
doi:10.1080/00207217.2012.713009 fatcat:uz7mnjoqtjgcdhooi5dxosbysq

A Low Complexity Scaling Method for the Lanczos Kernel in Fixed-Point Arithmetic

Juan Luis Jerez, George A. Constantinides, Eric C. Kerrigan
2015 IEEE transactions on computers  
It is shown that the numerical behaviour of fixed-point implementations of the modified problem can be chosen to be at least as good as a floating-point implementation, if necessary.  ...  We consider the problem of enabling fixed-point implementation of linear algebra kernels on low cost embedded systems, as well as motivating more efficient computational architectures for scientific applications  ...  ACKNOWLEDGMENTS The authors would like to acknowledge the support of the EPSRC (Grants EP/G031576/1 and EP/I012036/1) and the EU FP7 Project EMBOCON, as well as industrial support from Xilinx, the Mathworks  ... 
doi:10.1109/tc.2013.162 fatcat:h5rrhqwwcrepzo5gtui7u3abye

A Survey on Hardware Implementations of Visual Object Trackers

Al-Hussein El-Shafie, Serag Habib
2019 IET Image Processing  
This study presents a literature survey of the hardware implementations of object trackers over the last two decades.  ...  They highlight the lack of hardware implementations for state-of-the-art tracking algorithms as well as for enhanced classical algorithms.  ...  The co-processor handles most of the steps: image cropping and decimation step (from image resolution of 360 × 288 to 64 × 64), kernel computation, histogram, mean-shift displacement and Bhattacharyya  ... 
doi:10.1049/iet-ipr.2018.5952 fatcat:5qnpzd7y6bbzjfjdthe47qlnje
« Previous Showing results 1 — 15 out of 261 results