A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Scalable Heterogeneous Dataflow Architecture For Big Data Analytics Using FPGAs (Abstract Only)
2016
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '16
In this thesis, we build a dataflow architecture that harnesses FPGA resources within a distributed analytics platform creating a heterogeneous data analytics framework. ...
This approach leverages the scalability of existing distributed processing environments and provides easy access to custom hardware accelerators for large-scale data analysis. ...
Conclusions In this thesis, we built a CPU-FPGA dataflow architecture suitable for use in distributed analytics platforms with the goal of demonstrating the application of FPGAs in big data analytics. ...
doi:10.1145/2847263.2847294
dblp:conf/fpga/GhasemiC16
fatcat:7e7ezyt7rfdjpgkwhxzvricbly
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale
2016
Proceedings of the Seventh ACM Symposium on Cloud Computing - SoCC '16
In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. ...
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy ...
Acknowledgments This work is partially supported by the Center for Domain-Specific Computing under the NSF InTrans Award CCF-1436827, funding from CDSC industrial partners including Baidu, Fujitsu Labs ...
doi:10.1145/2987550.2987569
pmid:28317049
pmcid:PMC5351886
dblp:conf/cloud/HuangWYFICC16
fatcat:5f6bnm6xxbfk3k5fv3sgqarftu
FASTER: Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration
2015
Microprocessors and microsystems
The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) EU FP7 project, aims to ease the design and implementation of dynamically changing hardware systems. ...
Our motivation stems from the promise reconfigurable systems hold for achieving high performance and extending product functionality and lifetime via the addition of new features that operate at hardware ...
Acknowledgement This work was supported by the European Commission -Belgium in the context of FP7 FASTER project (#287804). ...
doi:10.1016/j.micpro.2014.09.006
fatcat:35jcur7nljhw7hqletmdrqjhum
Accelerating Seismic Computations Using Customized Number Representations on FPGAs
2009
EURASIP Journal on Embedded Systems
Compared to common processors, field-programable gate arrays (FPGAs) can boost the computation performance with a streaming computation architecture and the support for application-specific number representation ...
processing cores on an FPGA. ...
Acknowledgments The support from the Center for Computational Earth and Environmental Science, Stanford Exploration Project, Computer Architecture Research Group at Imperial College London, and Maxeler ...
doi:10.1155/2009/382983
fatcat:sbg6pqyg3nb45jqqp6tx4l44z4
3D-Stacked Many-Core Architecture for Biological Sequence Analysis Problems
2017
International journal of parallel programming
Sequence analysis plays extremely important role in bioinformatics, and most applications of which have compute intensive kernels consuming over 70% of total execution time. ...
By exploiting the compute intensive execution stages of popular sequence analysis applications, we present and evaluate a VLSI architecture with a focus on those that target at biological sequences directly ...
distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in ...
doi:10.1007/s10766-017-0495-0
fatcat:eftphbiuojastkml73s6rmik2u
3D-stacked many-core architecture for biological sequence analysis problems
2015
2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)
Sequence analysis plays extremely important role in bioinformatics, and most applications of which have compute intensive kernels consuming over 70% of total execution time. ...
By exploiting the compute intensive execution stages of popular sequence analysis applications, we present and evaluate a VLSI architecture with a focus on those that target at biological sequences directly ...
distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in ...
doi:10.1109/samos.2015.7363678
dblp:conf/samos/LiuHP15
fatcat:xysaiokksbem7cqlnvpqahr7be
Melia: A MapReduce Framework on OpenCL-Based FPGAs
2016
IEEE Transactions on Parallel and Distributed Systems
Compared to other processors like CPUs and GPUs, FPGAs are (re-)programmable hardware and have very low energy consumption. ...
This paper presents an energy-efficient architecture design for MapReduce on Field Programmable Gate Arrays (FPGAs). ...
ACKNOWLEDGEMENT We thank Altera University Program for their kind support in our research. ...
doi:10.1109/tpds.2016.2537805
fatcat:wguvs7g7gfe4fdv2ft2a7sde2y
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
2019
Electronics
This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications ...
of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. ...
GPU Implementation of neural network GPUs are highly parallel instruction-based platforms which can be seen as a vastly extended multi-core processor but with a different architecture and instruction set ...
doi:10.3390/electronics8020250
fatcat:fpkotkknt5hxpaelswaw6tazea
Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs
2021
Electronics
The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. ...
In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. ...
However, in the experiments, the ARM processor had 16 cores, while current embedded systems such as a RPi 4 only have four cores. ...
doi:10.3390/electronics10212622
fatcat:q5u64r6lzfhlbbotdnqinih5xe
Applications and Techniques for Fast Machine Learning in Science
[article]
2021
arXiv
pre-print
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing ...
training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. ...
Level of parallelism in the architecture and the degree of specialization As is shown in Figure 10 , we classify the compute architectures into scalar processors (CPUs), vectorbased processors (GPUs), ...
arXiv:2110.13041v1
fatcat:cvbo2hmfgfcuxi7abezypw2qrm
Resilience within Ultrascale Computing System: Challenges and Opportunities from Nesus Project
2015
Supercomputing Frontiers and Innovations
for ultrascale systems, resilience against (security) attacks, new approaches and methodologies to resilience in ultrascale systems, applications and case studies. ...
This paper reviews the challenges and approaches of resilience in ultrascale computing systems from multiple perspectives involving and addressing the resilience aspects of hardware-software co-design ...
And whereas research in the big data domain does not traditionally include research in processor and computer architecture, there is a clear correlation between the advances in the two domains. ...
doi:10.14529/jsfi150203
fatcat:nwtkd6ejubgwrms6zel6fnspjq
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform
2018
Computational Intelligence and Neuroscience
The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. ...
To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel ...
Acknowledgments This study was supported by the Natural Science
Supplementary Materials This section consists of the core source codes of the Map and Reduce tasks, respectively. ...
doi:10.1155/2018/3598284
pmid:29861711
pmcid:PMC5971336
fatcat:4imkhl4ksnbkhbrlgfiqod43ia
As a proof of concept, we present the instruction set architecture, microarchitecture, and hardware implementation of one DPU, called Q100. ...
In this paper, we propose Database Processing Units, or DPUs, a class of domain-specific database processors that can efficiently handle database applications. ...
The authors also wish to thank Yunsung Kim, Stephen Edwards, and the anonymous reviewers for their time and feedback. ...
doi:10.1145/2541940.2541961
dblp:conf/asplos/WuLPKR14
fatcat:din6wqa36vembba2eunalxz6ma
Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming
2020
International journal of parallel programming
Finally, we give our personal overview—as researchers active for more than two decades in the parallel programming models and frameworks area—of the process that led to the adoption of these concepts in ...
) architectures. ...
Big Data Applications (cHiPSet). ...
doi:10.1007/s10766-020-00684-w
fatcat:vtqcyf4he5gu3eefbjsb7nrxne
Graphics processing unit (GPU) programming strategies and trends in GPU computing
2013
Journal of Parallel and Distributed Computing
Explicit finite volume methods typically rely on stencil computations, making them inherently parallel, and therefore a near perfect match for the many-core graphics processing unit (GPU) found on today's ...
Through the scientific papers in this thesis, we present efficient hardware-adapted shallow water simulations on the GPU, based on a high-resolution centralupwind scheme. ...
Acknowledgements This work is supported in part by the Research Council of Norway through grant number 180023 (Parallel3D), and in part by the Centre of Mathematics for Applications at the University of ...
doi:10.1016/j.jpdc.2012.04.003
fatcat:7s4fnkx3yrekbmxabztmto5fzq
« Previous
Showing results 1 — 15 out of 265 results