17,917 Hits in 6.3 sec

Memory-Optimised Parallel Processing of Hi-C Data

Maurizio Drocco, Claudia Misale, Guilherme Peretti Pezzi, Fabio Tordini, Marco Aldinucci
2015 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing  
The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes.  ...  We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neighborhood graph.  ...  CONCLUSION In this work, we addressed the problem of the optimisation of data structures and memory allocation in order to exploit parallelism in a Hi-C data analysis application.  ... 
doi:10.1109/pdp.2015.63 dblp:conf/pdp/DroccoMPTA15 fatcat:5ufzw36ex5f6nikikagq3pbe64

NuChart-II: The road to a fast and scalable tool for Hi-C data analysis

Fabio Tordini, Maurizio Drocco, Claudia Misale, Luciano Milanesi, Pietro Liò, Ivan Merelli, Massimo Torquati, Marco Aldinucci
2016 The international journal of high performance computing applications  
normalisation technique for Hi-C data.  ...  AperTO -Archivio Istituzionale Open Access dell'Università di Torino NuChart-II: The road to a fast and scalable tool for Hi-C data analysis / Tordini, F.; Drocco, M.; Misale, C.; Milanesi, L.; Lio, P.  ...  ACKNOWLEDGEMENT This work has been partially supported by the EC-FP7 STREP project "REPARA" (no. 609666), the EU H2020 "RePhrase" project (no. 644235), and the Italian Ministry of Education and Research  ... 
doi:10.1177/1094342016668567 fatcat:yomvfrtdnzbend2upksavepjfa

3D Optimisation of Software Application Mappings on Heterogeneous MPSoCs [chapter]

Gereon Führ, Ahmed Hallawa, Rainer Leupers, Gerd Ascheid, Juan Fernando Eusse
2020 Lecture Notes in Computer Science  
Including memory power into this multi-objective optimisation problem is of utmost importance.  ...  Increasing the efficiency of parallel software development is one of the key obstacles in taking advantage of heterogeneous multi-core architectures.  ...  All three memory levels can be used to store the data of FIFO channels. Also, cluster memories host the data structures for synchronisation. The shared memory provides stack, heap and shared code.  ... 
doi:10.1007/978-3-030-52794-5_5 fatcat:6327jnzwsbaahd6bnrsux27gdy

Power aware data and memory management for dynamic applications

P. Marchal, J.I. Gomez, D. Atienza, S. Mamagkakis, F. Catthoor
2005 IEE Proceedings - Computers and digital Techniques  
The platform's performance and energy depend largely on how well the data-dominated services are mapped on the memory subsystem.  ...  Furthermore, their inherent flexibility perfectly supports the emerging market of interactive, mobile data and content services.  ...  Memory bandwidth optimisation for platformbased design In this Section, we illustrate how parallel accesses from different processing elements either to the shared memory (Section 4.1) or the local memory  ... 
doi:10.1049/ip-cdt:20045077 fatcat:udwq7ioilfdwxijd3goi5gzrgu

Limits of parallelism using dynamic dependency graphs

Jonathan Mak, Alan Mycroft
2009 Proceedings of the Seventh International Workshop on Dynamic Analysis - WODA '09  
We propose an extensible data dependence profiling framework that facilitates the estimation of a software component's inherent potential for parallelism.  ...  We use LLVM to perform a static analysis of a given code and instrument the code at the intermediate representation level to record memory accesses and control flow.  ...  development of this tool.  ... 
doi:10.1145/2134243.2134253 dblp:conf/issta/MakM09 fatcat:tdbnezz3erbarnrpqo2rj22cxe

A review of optimisation and least-square problem methods on field programmable gate array-based orthogonal matching pursuit implementations

Muhammad Muzakkir Mohd Nadzri, Afandi Ahmad
2022 Indonesian Journal of Electrical Engineering and Computer Science  
Orthogonal matching pursuit (OMP) is the most efficient algorithm used for the reconstruction of compressively sampled data signals in the implementation of compressive sensing.  ...  OMP operates in an iteration-based nature, which involves optimisation and least-square problem (LSP) as the main processes.  ...  Communication of this research is made possible through monetary assistance by Universiti Tun Hussein Onn Malaysia and the UTHM Publisher's Office via Publication Fund E15216.  ... 
doi:10.11591/ijeecs.v25.i2.pp920-930 fatcat:thg2j4feuvf43gw3vms4htcr3m

Optimisation of the Higher-Order Finite-Volume Unstructured Code Enhancement for Compressible Turbulent Flows

A. Shamakina, P. Tsoutsanis
2019 Zenodo  
In this White Paper, we report on optimisations of the HOVE2 code implemented in the course of the PRACE Preparatory Access Type C project "HOVE2" in the time frame of December 2018 to June 2019.  ...  A focus of optimisation was an implementation of the ParMETIS support and MPI-IO. Through the optimisation of the MPI collective communications significant speedups have been achieved.  ...  For example, MPI rank 18 processes a larger chunk of data than the rest.  ... 
doi:10.5281/zenodo.3527684 fatcat:q7jweo2n3bgtbiag5cbem6rray

Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform

Michał Czapiński, Stuart Barnes
2011 Journal of Parallel and Distributed Computing  
Relatively little research has been done so far on GPU implementations of discrete optimisation algorithms.  ...  The introduction of NVidia's powerful Tesla GPU hardware and Compute Unified Device Architecture (CUDA) platform enable many-core parallel programming.  ...  His research interests include distributed and parallel computing, discrete optimisation, and software engineering.  ... 
doi:10.1016/j.jpdc.2011.02.006 fatcat:dbuuj7wqjfcn5kzuoyg6d6mham

A new approach to parallelising tracing algorithms

Cosmin E. Oancea, Alan Mycroft, Stephen M. Watt
2009 Proceedings of the 2009 international symposium on Memory management - ISMM '09  
Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.  ...  of reference.  ...  The first and third authors gratefully acknowledge the Natural Science and Engineering Research Council of Canada for financial support.  ... 
doi:10.1145/1542431.1542434 dblp:conf/iwmm/OanceaMW09 fatcat:wzu2d5ic7vdgzifu7crsuzqs4m

CUDA-based Multi-core Implementation of MDS-based Bioinformatics Algorithms

Thilo Fester, Falk Schreiber, Marc Strickert
2009 German Conference on Bioinformatics  
The implementation of computational intensive tasks in parallel technology is one of the keys olutions to time-efficient data processing.  ...  High-throughput multidimensional scaling (HiT-MDS) is aversatile tool for biological data analyses that is systematically transferred to the GPU for taking advantages of the massively parallel hardware  ...  By massive parallel operations memory access latencyc an even be avoided by in-place recalculations instead of accessing big data caches.  ... 
dblp:conf/gcb/FesterSS09 fatcat:qfs27tpumvhb5naz4uyvmkpep4

An Implementation of Bitsliced DES on the Pentium MMXTM Processor [chapter]

Lauren May, Lyta Penna, Andrew Clark
2000 Lecture Notes in Computer Science  
Implementation specifics are discussed and comparisons made with an optimised C-coded DES implementation and an assembly language DES implementation.  ...  This paper sets the scene for future research of the inter-relation between design and implementation of the newer 128-bit symmetric block ciphers.  ...  Pentium implementation and optimisation issues are discussed. Some application limitations of bitsliced DES are identified, and a comparison of the C, assembler and bitslicing approaches is made.  ... 
doi:10.1007/10718964_10 fatcat:ypimcbeplzcupmghrjzyxwmijq

Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming

Marco Danelutto, Gabriele Mencagli, Massimo Torquati, Horacio González–Vélez, Peter Kilpatrick
2020 International journal of parallel programming  
Finally, we give our personal overview—as researchers active for more than two decades in the parallel programming models and frameworks area—of the process that led to the adoption of these concepts in  ...  This paper discusses the impact of structured parallel programming methodologies in state-of-the-art industrial and research parallel programming frameworks.  ...  IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet).  ... 
doi:10.1007/s10766-020-00684-w fatcat:vtqcyf4he5gu3eefbjsb7nrxne

Floating-to-Fixed-Point Conversion for Digital Signal Processors

Daniel Menard, Daniel Chillet, Olivier Sentieys
2006 EURASIP Journal on Advances in Signal Processing  
The fixed-point data types and the position of the scaling operations are optimised to reduce the code execution time.  ...  Digital signal processing applications are specified with floating-point data types but they are usually implemented in embedded systems with fixed-point arithmetic to minimise cost and power consumption  ...  In this case, the data are stored in memory with a greater precision. The data word-length is a multiple of the natural data word-lengths.  ... 
doi:10.1155/asp/2006/96421 fatcat:ftzhnlf7zzdovp3kmuk5h6etpq

Design of a Real-Time Face Detection Parallel Architecture Using High-Level Synthesis

Nicolas Farrugia, Franck Mamalet, Sébastien Roux, Fan Yang, Michel Paindavoine
2008 EURASIP Journal on Embedded Systems  
We then build a parallel architecture composed of a PE ring and a FIFO memory, which constitutes a generic architecture capable of processing images of different sizes.  ...  We rely on dataflow modelling of the algorithm and we use a high-level synthesis tool in order to specify the local dataflows of our Processing Element (PE), by describing in C language inter-PE communication  ...  ACKNOWLEDGMENTS The authors would like to thank Clément Quinson for his help in starting up with UGH, as well as Frédéric Pétrot and Ivan Augé of the UGH team for their help with specific points on the  ... 
doi:10.1155/2008/938256 fatcat:pno3gr66snc4pdkvid45inatgi

Fast free-form deformation using graphics processing units

Marc Modat, Gerard R. Ridgway, Zeike A. Taylor, Manja Lehmann, Josephine Barnes, David J. Hawkes, Nick C. Fox, Sébastien Ourselin
2010 Computer Methods and Programs in Biomedicine  
In this paper we present a parallel-friendly formulation of the algorithm suitable for Graphics Processing Unit execution.  ...  This technology could be of significant utility in time-critical applications such as image-guided interventions, or in the processing of large data sets.  ...  Since GPU-based computation is more efficient when processing large amounts of data concurrently, we optimise all control points and interpolate the whole image at each step.  ... 
doi:10.1016/j.cmpb.2009.09.002 pmid:19818524 fatcat:g3yingqvsjanxbzo5qbmtveole
« Previous Showing results 1 — 15 out of 17,917 results