241 Hits in 6.6 sec

2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32

2022 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS June 2021 1494-1510 Sova: A Software-Defined Autonomic Framework for Virtual Network Allocations.  ...  ., +, TPDS Jan. 2021 147-159 The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism.  ...  Graph coloring Feluca: A Two-Stage Graph Coloring Algorithm With Color-Centric Paradigm on GPU. Zheng, Z., +,  ... 
doi:10.1109/tpds.2021.3107121 fatcat:e7bh2xssazdrjcpgn64mqh4hb4

Big Data Deep Learning: Challenges and Perspectives

Xue-Wen Chen, Xiaotong Lin
2014 IEEE Access  
Deep learning is currently an extremely active research area in machine learning and pattern recognition society.  ...  INDEX TERMS Classifier design and evaluation, feature representation, machine learning, neural nets models, parallel processing. 514 2169-3536  ...  In this example, the GPU has 64 stream processors (SPs) organized into four multiprocessors (MPs), each with two stream multiprocessors (SMs).  ... 
doi:10.1109/access.2014.2325029 fatcat:qrriee367zetbb6scx2ng33yma

Performance, Design, and Autotuning of Batched GEMM for GPUs [chapter]

Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra
2016 Lecture Notes in Computer Science  
As batched computations on relatively small problems continue to gain interest in many scientific applications, a need arises for a high performance GEMM kernel for batches of small matrices.  ...  For most performance tests reported in this paper, the proposed kernels outperform state-of-the-art approaches using a K40c GPU.  ...  Finally, tensor contractions, used to model multilinear relations in areas of recent interest like big-data analytics and machine learning, as well as large scale highorder FEM simulations, can also be  ... 
doi:10.1007/978-3-319-41321-1_2 fatcat:quksfjshtvgthaf7ohwkvwqv6m

Numerical algorithms for high-performance computational science

Jack Dongarra, Laura Grigori, Nicholas J. Higham
2020 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
A number of features of today's high-performance computers make it challenging to exploit these machines fully for computational science.  ...  This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.  ...  We thank Massimilano Fasi, Theo Mary, Mantas Mikaitis, Srikara Praensh, and Mawussi Zounon for their comments on a draft manuscript.  ... 
doi:10.1098/rsta.2019.0066 pmid:31955676 fatcat:2l4iy3yxwvc3njht5smpsqloma

Big data and extreme-scale computing

M Asch, T Moore, R Badia, M Beck, P Beckman, T Bidot, F Bodin, F Cappello, A Choudhary, B de Supinski, E Deelman, J Dongarra (+27 others)
2018 The international journal of high performance computing applications  
Third, we focus on some opportunities for software ecosystem convergence in big, logically centralized facilities that execute large-scale simulations and models and/or perform large-scale data analytics  ...  Second, we offer an account of some of the problems involved with creating a converged infrastructure for peripheral environments, that is, a shared infrastructure that can be deployed throughout the network  ...  Acknowledgments The authors would like to acknowledge David Rogers for his work on the illustrations, Sam Crawford for editing support and the creation of the Appendix, and Piotr Luszczek for technical  ... 
doi:10.1177/1094342018778123 fatcat:vwrrxmad4rhtppq6ioaz4h5q7a

Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards

David Michéa, Dimitri Komatitsch
2010 Geophysical Journal International  
The methodology that we present can be used for Maxwell's equations as well because their form is similar to that of the seismic wave equation written in velocity vector and stress tensor.  ...  We accelerate a three-dimensional finite-difference in the time domain (FDTD) wave propagation code by a factor between about 20 and 60 compared to a serial implementation using Graphics Processing Unit  ...  They acknowledge the main developers of the ONDES3D software package, Hideo Aochi, Ariane Ducellier and Yohan Lee-Tin-Yien from BRGM (France), for their support.  ... 
doi:10.1111/j.1365-246x.2010.04616.x fatcat:rym3ewln6bax3pujeg3igf5jty

Comparing the costs of abstraction for DL frameworks [article]

Maksim Levental, Elena Orlova
2020 arXiv   pre-print
High level abstractions for implementing, training, and testing Deep Learning (DL) models abound.  ...  We study at which points exactly in the engineering life-cycle of a DL model the highest costs are paid and whether they can be mitigated.  ...  ACKNOWLEDGEMENTS We would like to thank Rick Stevens and Ian Foster for their constructive criticism and feedback on the project and paper itself.  ... 
arXiv:2012.07163v1 fatcat:6eva4rrvi5cqlojvp34gkx4ycy

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems

2016 Supercomputing Frontiers and Innovations  
We present a review of the current best practices in parallel programming models for dense linear algebra (DLA) on heterogeneous architectures.  ...  libraries considered, we outline our view of the current strengths and weaknesses of their programming models -especially in regards to hardware trends and ease of programming high-performance numerical software  ...  Figure 5 , Right illustrates the need for tensor contractions in machine learning.  ... 
doi:10.14529/jsfi150405 fatcat:avnmwu4dozdmjksknrlznhpv7u

Holistic Technologies for Managing Internet of Things Services

Rajiv Ranjan, Ching-Hsien Hsu, Lydia Y. Chen, Dimitrios Georgakopoulos
2020 IEEE Transactions on Services Computing  
IoT's ability to observe and affect the physical world presents a unprecedented opportunity for creating IoT-based smart services and products that address grant challenges in emerging opportunities in  ...  that affect the physical world, and support a variety of applications controlled by different organizations and individuals.  ...  Her research interests include distributed machine learning, dependability management, resource allocation for large-scale data processing systems and services.  ... 
doi:10.1109/tsc.2020.3000844 fatcat:okmbpi6dx5dnvj2mglqhco2aeq

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

Sebastian Raschka, Joshua Patterson, Corey Nolet
2020 Information  
This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it.  ...  We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.  ...  For example, it is possible to scale up a single machine by installing multiple GPUs on it.  ... 
doi:10.3390/info11040193 fatcat:hetp7ngcpbbcpkhdcyowuiiwxe

Generating and Exploiting Deep Learning Variants to Increase Heterogeneous Resource Utilization in the NVIDIA Xavier

Roger Pujol, Hamid Tabani, Leonidas Kosmidis, Enrico Mezzetti, Jaume Abella, Francisco J. Cazorla, Michael Wagner
2019 Euromicro Conference on Real-Time Systems  
run in 4 or 8 GPU's Streaming Multiprocessors (SM); and 1 or 2 NVIDIA's Deep Learning Accelerators (NVDLA); (b) we show that each particular variant/configuration offers a different resource utilization  ...  In this paper, (a) we develop different variants (implementations) of well-known DNN libraries used in the Apollo Autonomous Driving (AD) software for each of the computing elements of the latest NVIDIA  ...  The GPU is structured in 8 Streaming Multiprocessors (SMs) each containing 64 regular and 8 Tensor cores.  ... 
doi:10.4230/lipics.ecrts.2019.23 dblp:conf/ecrts/PujolTKMAC19 fatcat:df3hvqmglnh4tdwdd2ql3ajswa

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture [article]

Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu
2021 arXiv   pre-print
Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems.  ...  In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU  ...  CONCLUSION In this work, we introduced a GPU-oriented, software defined data transfer architecture for efficient GCN training on large graphs.  ... 
arXiv:2103.03330v3 fatcat:d7ejvinmtng5xghbc4ank2vthy

Benchmarking Modern Edge Devices for AI Applications

Pilsung KANG, Jongmin JO
2021 IEICE transactions on information and systems  
We perform a set of deep learning benchmarks on the devices to measure their performance.  ...  AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology.  ...  the TPU (tensor processing unit) [15] , an applicationspecific integrated circuit (ASIC) designed for accelerating neural network machine learning, particularly using Google's own TensorFlow framework  ... 
doi:10.1587/transinf.2020edp7160 fatcat:4uo7pehd7vbylckgmpoh5s34im

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities [article]

Fuxun Yu, Di Wang, Longfei Shangguan, Minjia Zhang, Xulong Tang, Chenchen Liu, Xiang Chen
2022 arXiv   pre-print
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.  ...  novel works in large-scale deep learning system optimization.  ...  Ansor: Generating high-performance tensor programs for deep learning.  ... 
arXiv:2111.14247v2 fatcat:yoeol5xrj5guhh2tiulcvopmue

BigDataBench: A Scalable and Unified Big Data and AI Benchmark Suite [article]

Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Xu Wen, Rui Ren, Chen Zheng, Xiwen He, Hainan Ye, Haoning Tang, Zheng Cao, Shujie Zhang (+1 others)
2018 arXiv   pre-print
In this context, architecture, system, data management, and machine learning communities pay greater attention to innovative big data and AI algorithms, architecture, and systems.  ...  First, the traditional benchmarking methodology that creates a new benchmark or proxy for every possible workload is not scalable, or even impossible for Big Data and AI benchmarking.  ...  As a machine learning framework, TensorFlow [6] adopts a dataflow-based programming abstraction, using individual mathematical operators as nodes in the dataflow graph.  ... 
arXiv:1802.08254v2 fatcat:6ktsa3yowvaqtjbez26akp7a7e
« Previous Showing results 1 — 15 out of 241 results