265 Hits in 6.5 sec

A Scalable Heterogeneous Dataflow Architecture For Big Data Analytics Using FPGAs (Abstract Only)

Ehsan Ghasemi, Paul Chow
2016 Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '16  
In this thesis, we build a dataflow architecture that harnesses FPGA resources within a distributed analytics platform creating a heterogeneous data analytics framework.  ...  This approach leverages the scalability of existing distributed processing environments and provides easy access to custom hardware accelerators for large-scale data analysis.  ...  Conclusions In this thesis, we built a CPU-FPGA dataflow architecture suitable for use in distributed analytics platforms with the goal of demonstrating the application of FPGAs in big data analytics.  ... 
doi:10.1145/2847263.2847294 dblp:conf/fpga/GhasemiC16 fatcat:7e7ezyt7rfdjpgkwhxzvricbly

Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale

Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, Jason Cong
2016 Proceedings of the Seventh ACM Symposium on Cloud Computing - SoCC '16  
In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators.  ...  With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy  ...  Acknowledgments This work is partially supported by the Center for Domain-Specific Computing under the NSF InTrans Award CCF-1436827, funding from CDSC industrial partners including Baidu, Fujitsu Labs  ... 
doi:10.1145/2987550.2987569 pmid:28317049 pmcid:PMC5351886 dblp:conf/cloud/HuangWYFICC16 fatcat:5f6bnm6xxbfk3k5fv3sgqarftu

FASTER: Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration

D. Pnevmatikatos, K. Papadimitriou, T. Becker, P. Böhm, A. Brokalakis, K. Bruneel, C. Ciobanu, T. Davidson, G. Gaydadjiev, K. Heyse, W. Luk, X. Niu (+9 others)
2015 Microprocessors and microsystems  
The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) EU FP7 project, aims to ease the design and implementation of dynamically changing hardware systems.  ...  Our motivation stems from the promise reconfigurable systems hold for achieving high performance and extending product functionality and lifetime via the addition of new features that operate at hardware  ...  Acknowledgement This work was supported by the European Commission -Belgium in the context of FP7 FASTER project (#287804).  ... 
doi:10.1016/j.micpro.2014.09.006 fatcat:35jcur7nljhw7hqletmdrqjhum

Accelerating Seismic Computations Using Customized Number Representations on FPGAs

Haohuan Fu, William Osborne, Robert G. Clapp, Oskar Mencer, Wayne Luk
2009 EURASIP Journal on Embedded Systems  
Compared to common processors, field-programable gate arrays (FPGAs) can boost the computation performance with a streaming computation architecture and the support for application-specific number representation  ...  processing cores on an FPGA.  ...  Acknowledgments The support from the Center for Computational Earth and Environmental Science, Stanford Exploration Project, Computer Architecture Research Group at Imperial College London, and Maxeler  ... 
doi:10.1155/2009/382983 fatcat:sbg6pqyg3nb45jqqp6tx4l44z4

3D-Stacked Many-Core Architecture for Biological Sequence Analysis Problems

Pei Liu, Ahmed Hemani, Kolin Paul, Christian Weis, Matthias Jung, Norbert Wehn
2017 International journal of parallel programming  
Sequence analysis plays extremely important role in bioinformatics, and most applications of which have compute intensive kernels consuming over 70% of total execution time.  ...  By exploiting the compute intensive execution stages of popular sequence analysis applications, we present and evaluate a VLSI architecture with a focus on those that target at biological sequences directly  ...  distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in  ... 
doi:10.1007/s10766-017-0495-0 fatcat:eftphbiuojastkml73s6rmik2u

3D-stacked many-core architecture for biological sequence analysis problems

Pei Liu, Ahmed Hemani, Kolin Paul
2015 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)  
Sequence analysis plays extremely important role in bioinformatics, and most applications of which have compute intensive kernels consuming over 70% of total execution time.  ...  By exploiting the compute intensive execution stages of popular sequence analysis applications, we present and evaluate a VLSI architecture with a focus on those that target at biological sequences directly  ...  distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in  ... 
doi:10.1109/samos.2015.7363678 dblp:conf/samos/LiuHP15 fatcat:xysaiokksbem7cqlnvpqahr7be

Melia: A MapReduce Framework on OpenCL-Based FPGAs

Zeke Wang, Shuhao Zhang, Bingsheng He, Wei Zhang
2016 IEEE Transactions on Parallel and Distributed Systems  
Compared to other processors like CPUs and GPUs, FPGAs are (re-)programmable hardware and have very low energy consumption.  ...  This paper presents an energy-efficient architecture design for MapReduce on Field Programmable Gate Arrays (FPGAs).  ...  ACKNOWLEDGEMENT We thank Altera University Program for their kind support in our research.  ... 
doi:10.1109/tpds.2016.2537805 fatcat:wguvs7g7gfe4fdv2ft2a7sde2y

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

Martin Dendaluce Jahnke, Francesco Cosco, Rihards Novickis, Joshué Pérez Rastelli, Vicente Gomez-Garay
2019 Electronics  
This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications  ...  of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities.  ...  GPU Implementation of neural network GPUs are highly parallel instruction-based platforms which can be seen as a vastly extended multi-core processor but with a different architecture and instruction set  ... 
doi:10.3390/electronics8020250 fatcat:fpkotkknt5hxpaelswaw6tazea

Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

Jurgen Vandendriessche, Nick Wouters, Bruno da Silva, Mimoun Lamrini, Mohamed Yassin Chkouri, Abdellah Touhafi
2021 Electronics  
The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy.  ...  In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms.  ...  However, in the experiments, the ARM processor had 16 cores, while current embedded systems such as a RPi 4 only have four cores.  ... 
doi:10.3390/electronics10212622 fatcat:q5u64r6lzfhlbbotdnqinih5xe

Applications and Techniques for Fast Machine Learning in Science [article]

Allison McCarn Deiana, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini (+74 others)
2021 arXiv   pre-print
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing  ...  training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms.  ...  Level of parallelism in the architecture and the degree of specialization As is shown in Figure 10 , we classify the compute architectures into scalar processors (CPUs), vectorbased processors (GPUs),  ... 
arXiv:2110.13041v1 fatcat:cvbo2hmfgfcuxi7abezypw2qrm

Resilience within Ultrascale Computing System: Challenges and Opportunities from Nesus Project

2015 Supercomputing Frontiers and Innovations  
for ultrascale systems, resilience against (security) attacks, new approaches and methodologies to resilience in ultrascale systems, applications and case studies.  ...  This paper reviews the challenges and approaches of resilience in ultrascale computing systems from multiple perspectives involving and addressing the resilience aspects of hardware-software co-design  ...  And whereas research in the big data domain does not traditionally include research in processor and computer architecture, there is a clear correlation between the advances in the two domains.  ... 
doi:10.14529/jsfi150203 fatcat:nwtkd6ejubgwrms6zel6fnspjq

Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

Jianfang Cao, Lichao Chen, Min Wang, Yun Tian
2018 Computational Intelligence and Neuroscience  
The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images.  ...  To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel  ...  Acknowledgments This study was supported by the Natural Science Supplementary Materials This section consists of the core source codes of the Map and Reduce tasks, respectively.  ... 
doi:10.1155/2018/3598284 pmid:29861711 pmcid:PMC5971336 fatcat:4imkhl4ksnbkhbrlgfiqod43ia


Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, Kenneth A. Ross
2014 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14  
As a proof of concept, we present the instruction set architecture, microarchitecture, and hardware implementation of one DPU, called Q100.  ...  In this paper, we propose Database Processing Units, or DPUs, a class of domain-specific database processors that can efficiently handle database applications.  ...  The authors also wish to thank Yunsung Kim, Stephen Edwards, and the anonymous reviewers for their time and feedback.  ... 
doi:10.1145/2541940.2541961 dblp:conf/asplos/WuLPKR14 fatcat:din6wqa36vembba2eunalxz6ma

Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming

Marco Danelutto, Gabriele Mencagli, Massimo Torquati, Horacio González–Vélez, Peter Kilpatrick
2020 International journal of parallel programming  
Finally, we give our personal overview—as researchers active for more than two decades in the parallel programming models and frameworks area—of the process that led to the adoption of these concepts in  ...  ) architectures.  ...  Big Data Applications (cHiPSet).  ... 
doi:10.1007/s10766-020-00684-w fatcat:vtqcyf4he5gu3eefbjsb7nrxne

Graphics processing unit (GPU) programming strategies and trends in GPU computing

André R. Brodtkorb, Trond R. Hagen, Martin L. Sætra
2013 Journal of Parallel and Distributed Computing  
Explicit finite volume methods typically rely on stencil computations, making them inherently parallel, and therefore a near perfect match for the many-core graphics processing unit (GPU) found on today's  ...  Through the scientific papers in this thesis, we present efficient hardware-adapted shallow water simulations on the GPU, based on a high-resolution centralupwind scheme.  ...  Acknowledgements This work is supported in part by the Research Council of Norway through grant number 180023 (Parallel3D), and in part by the Centre of Mathematics for Applications at the University of  ... 
doi:10.1016/j.jpdc.2012.04.003 fatcat:7s4fnkx3yrekbmxabztmto5fzq
« Previous Showing results 1 — 15 out of 265 results