Filters








164 Hits in 4.2 sec

3D Localization for Light-Field Microscopy via Convolutional Sparse Coding on Epipolar Images

Pingfan Song, Herman Verinaz Jadan, Carmel L. Howe, Pete Quicke, Amanda J. Foust, Pier Luigi Dragotti
2020 IEEE Transactions on Computational Imaging  
In this paper, we propose a new 3D localization approach to effectively detect 3D positions of neuronal cells from a single light-field image with high accuracy and outstanding robustness to light scattering  ...  Extensive experiments demonstrate that our approach can reliably detect the 3D positions of granular targets with small Root Mean Square Error (RMSE), high robustness to optical aberration and light scattering  ...  and thereby offering a path toward efficient 3D localization.  ... 
doi:10.1109/tci.2020.2997301 pmid:32851121 pmcid:PMC7442043 fatcat:jmptasrwb5a5ni2mssdln2x3o4

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Md Taufique Hussain, Oguz Selvitopi, Aydin Buluc, Ariful Azad
2021 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms.  ...  Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output.  ...  Parallel efficiency. We compute the parallel efficiency by using P 1 P 2 T (P 1) (P 2) where T (P ) denotes the runtime with P processes, and P 2>P 1.  ... 
doi:10.1109/ipdps49936.2021.00018 fatcat:hsqxxkxqdbakbcem3ssab7zm2u

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale [article]

Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad
2020 arXiv   pre-print
Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms.  ...  Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output.  ...  Parallel efficiency. We compute the parallel efficiency by using P 1 P 2 T (P 1) T (P 2) where T (P ) denotes the runtime with P processes, and P 2>P 1.  ... 
arXiv:2010.08526v1 fatcat:vf6csmuuerganoqulk23x56cnq

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training [article]

Zhengda Bian and Hongxin Liu and Boxiang Wang and Haichen Huang and Yongbin Li and Chuanrui Wang and Fan Cui and Yang You
2021 arXiv   pre-print
parallelism, multiple tensor parallelism, and sequence parallelism.  ...  The documentations can be found at https://www.colossalai.org and the source code can be found at https://github.com/hpcaitech/ColossalAI.  ...  To address this problem, 2D, 2.5D and 3D tensor parallelism were proposed to fully eliminate memory redundancy. 2D Tensor Parallelism This method (Xu et al. 2021 ) relies on the SUMMA matrix multiplication  ... 
arXiv:2110.14883v1 fatcat:vgoi2r4byjfpddnq7gvmur7gju

A massively parallel tensor contraction framework for coupled-cluster computations

Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, James Demmel
2014 Journal of Parallel and Distributed Computing  
Each contraction may be executed via matrix multiplication on a properly ordered and structured tensor. However, data transpositions are often needed to reorder the tensors for each contraction.  ...  Our CCSD and CCSDT implementations achieve high parallel scalability on the BlueGene/Q and Cray XC30 supercomputer architectures showing that accurate electronic structure calculations can be effectively  ...  This library employs similar matrix multiplication primitives (SUMMA and 3D algorithms) for distributed tensor contractions and mapping of data onto torus networks.  ... 
doi:10.1016/j.jpdc.2014.06.002 fatcat:76at7oi2vfhbxfc6tmbzoe2xyy

Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

Emanuel H. Rubensson, Elias Rudberg
2016 Parallel Computing  
We present a method for parallel block-sparse matrix-matrix multiplication on distributed memory clusters.  ...  A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)].  ...  Acknowledgements Support from the Göran Gustafsson foundation, the Swedish research council (grant no. 623-2009-803 and 621-2012-3861), the Lisa and Carl-Gustav Esseen foundation, and the Swedish national  ... 
doi:10.1016/j.parco.2016.06.005 fatcat:tleeeqbunjh43pyuzcqqufln24

Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid

Grey Ballard, Christopher Siefert, Jonathan Hu
2016 SIAM Journal on Scientific Computing  
In particular, we show that the most commonly used parallel algorithm is often not the most communication-efficient one for all of the matrix-matrix multiplications involved.  ...  In this paper, we show that the most commonly used parallel SpMM algorithm is often not the most communication-efficient one for all of the matrix multiplications involved.  ...  [4] consider multiple algorithms, classifying them into 1D (described in Section 4), 2D (which include Sparse SUMMA and Sparse Cannon) , and 3D varieties.  ... 
doi:10.1137/15m1028807 fatcat:idsejyuelnbvrjndf7lv3geeda

The parallelism motifs of genomic data analysis

Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi (+2 others)
2020 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel  ...  Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory  ...  [60] , and GNNs are bottlenecked with large sparse matrix-dense matrix multiplications [61] .  ... 
doi:10.1098/rsta.2019.0394 pmid:31955674 fatcat:kzujmq5u2refvhoovtb2ap5vha

DASH: Distributed Data Structures and Parallel Algorithms in a Global Address Space [chapter]

Karl Fürlinger, José Gracia, Andreas Knüpfer, Tobias Fuchs, Denis Hünich, Pascal Jungblut, Roger Kowalewski, Joseph Schuchart
2020 Lecture Notes in Computational Science and Engineering  
DASH is a new programming approach offering distributed data structures and parallel algorithms in the form of a C++ template library.  ...  We also present a performance and productivity study where we compare DASH with a set of established parallel programming models.  ...  We would also like to thank the German research foundation (DFG) for the funding received through the SPPEXA priority programme and initiators and managers of SPPEXA for their foresight and level-headed  ... 
doi:10.1007/978-3-030-47956-5_6 fatcat:44avzbgnkvh73iriqceboti4wu

Matrix-free construction of HSS representation using adaptive randomized sampling [article]

Christopher Gorman, Gustavo Chávez, Pieter Ghysels, Théo Mary, François-Henry Rouet, Xiaoye Sherry Li
2018 arXiv   pre-print
We discuss parallel implementation and computation and communication cost of both variants.  ...  Parallel numerical results for a range of applications, including boundary element method matrices and quantum chemistry Toeplitz matrices, show the effectiveness, scalability and numerical robustness  ...  We thank Daniel Haxton (LBNL) and Jeremiah Jones (Arizona State University) for providing us with the Quantum Chemistry test problem.  ... 
arXiv:1810.04125v2 fatcat:w2qklyn7rvb6fliishcmqzqb3i

Design and Implementation of the PULSAR Programming System for Large Scale Computing

2017 Supercomputing Frontiers and Innovations  
The PULSAR programming model is quite simple, with point-to-point channels as the main communication abstraction.  ...  The runtime implementation is very lightweight and fully distributed, and provides multithreading, messagepassing and multi-GPU offload capabilities.  ...  Acknowledgements This work has been supported by the National Science Foundation, under grant SHF-1117062, Parallel Unified Linear algebra with Systolic ARrays (PULSAR).  ... 
doi:10.14529/jsfi170101 fatcat:b6afot42rfakxicwf6rdz7sela

5G – Wireless Communications for 2020

André Noll Barreto, Bruno Faria, Erika Almeida, Ignacio Rodriguez, Mads Lauridsen, Rafhael Amorim, Robson Vieira
2016 Journal of Communication and Information Systems  
demanding and varied requirements that cannot be satisfied by current networks.  ...  This is due not only to the growth in data traffic and in the number of connected terminals, but also because we are on the verge of new era, where everyone and everything will be connected, with more  ...  MIMO systems consist in the adoption of multiple antennas in both receiver and transmitter ends, aiming at improvements in spectral efficiency and robustness.  ... 
doi:10.14209/jcis.2016.14 fatcat:piyq3my3ejex7afk42qjwk6erm

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique
2020 IEEE Access  
This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance  ...  prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process.  ...  Among the numerous subroutines implemented, the BLAS also include element-wise matrix multiplication, matrix-vector multiplication and matrix-matrix multiplication, also called General Matrix Multiplication  ... 
doi:10.1109/access.2020.3039858 fatcat:nticzqgrznftrcji4krhyjxudu

Analyzing trajectories on Grassmann manifold for early emotion detection from depth videos

Taleb Alashkar, Boulbaba Ben Amor, Stefano Berretti, Mohamed Daoudi
2015 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)  
That is, a sequence of 3D faces is first split to an indexed collection of short-term sub-sequences that are represented as matrix (subspace) which define a special matrix manifold called, Grassmann manifold  ...  They are respectively (1) a dictionary (of subspaces) representation associated to Dictionary Learning and Sparse Coding techniques and (2) a time-parameterized curve (trajectory) representation on the  ...  The main steps of this pipeline are summa- is performed, the mean curvature is computed from each 3D frame (Fig. 4.2) .  ... 
doi:10.1109/fg.2015.7163122 dblp:conf/fgr/AlashkarABD15 fatcat:53lkhk6apzaqndylmu5iphqupe

CALIBRATION OF A MULTI-CAMERA ROVER

A. Brunn, T. Meyer
2016 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Although photogrammetric specialists realize the benefits of such systems immediately, surveyors have difficulties to find efficient usages.  ...  To approach this new measurement systems the technique has to be understood and the confidence on the accuray has to grow.  ...  This work has partly been funded by the EU and the Free State of Bavaria within the project "DiPhoBi4KMU".  ... 
doi:10.5194/isprs-archives-xli-b5-445-2016 fatcat:ufzzofmj7fdu7oky4byzfcmy6u
« Previous Showing results 1 — 15 out of 164 results